Besbes, Omar; Gur, Yonatan; Zeevi, Assaf - Graduate School of Business, Stanford University - 2014
In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler's objective is to maximize his cumulative expected earnings...