Iki, Tetsuichiro; Horiguchi, Masayuki; Kurano, Masami - In: Mathematical Methods of Operations Research 66 (2007) 3, pp. 545-555
In this paper, we are concerned with a new algorithm for multichain finite state Markov decision processes which finds an average optimal policy through the decomposition of the state space into some communicating classes and a transient class. For each communicating class, a relatively optimal...