Zhu, Q. - In: Computational Statistics 65 (2007) 3, pp. 519-538
This paper studies both the average sample-path reward (ASPR) criterion and the limiting average variance criterion for denumerable discrete-time Markov decision processes. The rewards may have neither upper nor lower bounds. We give sufficient conditions on the system’s primitive data and...