Wakuta, Kazuyoshi - In: Stochastic Processes and their Applications 56 (1995) 1, pp. 159-169
For a vector-valued Markov decision process, we characterize optimal (deterministic) stationary policies by systems of linear inequalities and present an algorithm for finding all optimal stationary policies from among all randomized, history-remembering ones. The algorithm consists of improving...