//-->
The multi-armed bandit, with constraints
Denardo, Eric V., (2013)
Splitting randomized stationary policies in total-reward Markov decision processes
Feinberg, Eugene A., (2012)