Modified Policy Iteration Algorithms for Discounted Markov Decision Problems
In this paper we study a class of modified policy iteration algorithms for solving Markov decision problems. These correspond to performing policy evaluation by successive approximations. We discuss the relationship of these algorithms to Newton-Kantorovich iteration and demonstrate their covergence. We show that all of these algorithms converge at least as quickly as successive approximations and obtain estimates of their rates of convergence. An analysis of the computational requirements of these algorithms suggests that they may be appropriate for solving problems with either large numbers of actions, large numbers of states, sparse transition matrices, or small discount rates. These algorithms are compared to policy iteration, successive approximations, and Gauss-Seidel methods on large randomly generated test problems.
Year of publication: |
1978
|
---|---|
Authors: | Puterman, Martin L. ; Shin, Moon Chirl |
Published in: |
Management Science. - Institute for Operations Research and the Management Sciences - INFORMS, ISSN 0025-1909. - Vol. 24.1978, 11, p. 1127-1137
|
Publisher: |
Institute for Operations Research and the Management Sciences - INFORMS |
Saved in:
Saved in favorites
Similar items by person
-
Action elimination procedures for modified policy iteration algorithms
Puterman, Martin L., (1982)
-
Markov decision processes : discrete stochastic dynamic programming
Puterman, Martin L., (1994)
-
Ywata de Carvalho, Alexandre, (2015)
- More ...