The Minimax-Regret Decision Framework for Online A/B Tests
In online A/B tests, where the sample flows in over time from a very large population and multiple policies can be experimented concurrently, an analyst has to make decisions on the efficient traffic allocation, when to stop the experiment, and which of the policies to adopt. We develop the minimax-regret decision framework for online A/B tests, providing integrated optimal solutions for the aforementioned decision problems. Our minimax-regret decision framework controls for the maximum regret, which is the maximum anticipated net payoff loss from making an error in a decision, instead of the maximum Type I error probabilities. Notably, our framework rationalizes and advocates a much less conservative cutoff decision rule in favor of new, innovative policies than the conventional hypothesis-testing cutoffs. We apply our framework to a large mobile game company's multi-arm experiment data, in which our minimax-regret decision framework arrives at drastically different decisions than the conventional hypothesis-testing framework. We then validate our framework by running a series of Monte Carlo simulations that mimic the data-generating process of our focal company's experiments. Our minimax-regret decision criteria attain better performance in designating the correct decisions as well as achieving lower net payoff loss than the hypothesis-testing counterpart. Without sacrificing the accuracy of the decisions, our minimax-regret efficient traffic allocation reduces the wait time and sample size by more than 30% relative to the ad-hoc traffic allocation