On boundedness of Q-learning iterates for stochastic shortest path problems
Year of publication: |
2013
|
---|---|
Authors: | Yu, Huizhen ; Bertsekas, Dimitri P. |
Published in: |
Mathematics of operations research. - Catonsville, MD : INFORMS, ISSN 0364-765X, ZDB-ID 195683-8. - Vol. 38.2013, 2, p. 209-227
|
Subject: | Markov decision processes | Q-learning | stochastic approximation | dynamic programming | reinforcement learning | Theorie | Theory | Markov-Kette | Markov chain | Dynamische Optimierung | Dynamic programming | Stochastischer Prozess | Stochastic process | Lernprozess | Learning process | Mathematische Optimierung | Mathematical programming | Entscheidung | Decision |
-
Optimising darts strategy using Markov decision processes and reinforcement learning
Baird, Graham, (2020)
-
Bayesian learning of dose-response parameters from a cohort under response-guided dosing
Kotas, Jakob, (2018)
-
Envelope theorems for multistage linear stochastic optimization
Terça, Gonçalo, (2021)
- More ...
-
Error bounds for approximations from projected linear equations
Yu, Huizhen, (2010)
-
Q-learning and policy iteration algorithms for stochastic shortest path problems
Yu, Huizhen, (2013)
-
Q-learning and enhanced policy iteration in discounted dynamic programming
Bertsekas, Dimitri P., (2012)
- More ...