Ahuja, Vishal; Birge, John R. - 2020
Multi-armed bandit (MAB) problems, typically modeled as Markov decision processes (MDPs), exemplify the learning vs. earning tradeoff. An area that has motivated theoretical research in MAB designs is the study of clinical trials, where the application of such designs has the potential to...