A basic formula for performance gradient estimation of semi-Markov decision processes
This paper presents a basic formula for performance gradient estimation of semi-Markov decision processes (SMDPs) under average-reward criterion. This formula directly follows from a sensitivity equation in perturbation analysis. With this formula, we develop three sample-path-based gradient estimation algorithms by using a single sample path. These algorithms naturally extend many gradient estimation algorithms for discrete-time Markov systems to continuous time semi-Markov models. In particular, they require less storage than the algorithm in the literature.
Year of publication: |
2013
|
---|---|
Authors: | Li, Yanjie ; Cao, Fang |
Published in: |
European Journal of Operational Research. - Elsevier, ISSN 0377-2217. - Vol. 224.2013, 2, p. 333-339
|
Publisher: |
Elsevier |
Subject: | Markov processes | Semi-Markov decision processes | Sample-path-based gradient estimation | Perturbation analysis |
Saved in:
Online Resource
Saved in favorites
Similar items by subject
-
Price-Directed Replenishment of Subsets: Methodology and Its Application to Inventory Routing
Adelman, Daniel, (2003)
-
Mondal, Prasenjit, (2015)
-
Semi-Markov decision processes with variance minimization criterion
Wei, Qingda, (2015)
- More ...
Similar items by person