A basic formula for performance gradient estimation of semi-Markov decision processes
This paper presents a basic formula for performance gradient estimation of semi-Markov decision processes (SMDPs) under average-reward criterion. This formula directly follows from a sensitivity equation in perturbation analysis. With this formula, we develop three sample-path-based gradient estimation algorithms by using a single sample path. These algorithms naturally extend many gradient estimation algorithms for discrete-time Markov systems to continuous time semi-Markov models. In particular, they require less storage than the algorithm in the literature.
| Year of publication: |
2013
|
|---|---|
| Authors: | Li, Yanjie ; Cao, Fang |
| Published in: |
European Journal of Operational Research. - Elsevier, ISSN 0377-2217. - Vol. 224.2013, 2, p. 333-339
|
| Publisher: |
Elsevier |
| Subject: | Markov processes | Semi-Markov decision processes | Sample-path-based gradient estimation | Perturbation analysis |
Saved in:
Saved in favorites
Similar items by subject
-
Computing semi-stationary optimal policies for multichain semi-Markov decision processes
Mondal, Prasenjit, (2020)
-
Semi-Markov decision processes with variance minimization criterion
Wei, Qingda, (2015)
-
On undiscounted semi-Markov decision processes with absorbing states
Mondal, Prasenjit, (2016)
- More ...
Similar items by person