Dynamic Online Pricing with Incomplete Information Using Multi-Armed Bandit Experiments
Pricing managers at online retailers face a unique challenge. They must decide on real-time prices for a large number of products with incomplete demand information. The manager runs price experiments to learn about each product's demand curve and the profit-maximizing price. Balanced field price experiments, in practice can create high opportunity costs since a large number of customers are presented with sub-optimal prices. In this paper, we propose an alternative dynamic price experimentation policy. The proposed approach extends multi-armed bandit (MAB) algorithms, from statistical machine learning, to include microeconomic choice theory. Our automated pricing policy solves this MAB problem using a scalable distribution-free algorithm. We prove analytically that our method is asymptotically optimal for any weakly downward sloping demand curve. In a series of Monte Carlo simulations, we show that the proposed approach perform favorably compared to balanced field experiments and standard methods in dynamic pricing from computer science. In a calibrated simulation based on an existing pricing field experiment, we find that our algorithm can increase profits by 43% profits during the month of testing and 4% annually