Showing 1 - 1 of 1
This paper examines the multi-armed bandit problem in the case where the bandits’ rewards are drawn from stationary but unknown distributions. Unlike the classical problem, players must factor in the informational value of each future sample to balance exploration against exploitation. Using...
Persistent link: https://www.econbiz.de/10013216714