A basic formula for performance gradient estimation of semi-Markov decision processes

This paper presents a basic formula for performance gradient estimation of semi-Markov decision processes (SMDPs) under average-reward criterion. This formula directly follows from a sensitivity equation in perturbation analysis. With this formula, we develop three sample-path-based gradient estimation algorithms by using a single sample path. These algorithms naturally extend many gradient estimation algorithms for discrete-time Markov systems to continuous time semi-Markov models. In particular, they require less storage than the algorithm in the literature.

MoreLess

Year of publication:	2013
Authors:	Li, Yanjie ; Cao, Fang
Published in:	European Journal of Operational Research. - Elsevier, ISSN 0377-2217. - Vol. 224.2013, 2, p. 333-339
Publisher:	Elsevier
Subject:	Markov processes \| Semi-Markov decision processes \| Sample-path-based gradient estimation \| Perturbation analysis

More details

Type of publication:	Article
Source:	RePEc - Research Papers in Economics

Persistent link: https://www.econbiz.de/10010588338