The role of lookahead and approximate policy evaluation in reinforcement learning with linear value function approximation
| Year of publication: |
2025
|
|---|---|
| Authors: | Winnicki, Anna ; Lubars, Joseph ; Livesay, Michael ; Srikant, Rayadurgam |
| Published in: |
Operations research. - Linthicum, Md. : INFORMS, ISSN 1526-5463, ZDB-ID 2019440-7. - Vol. 73.2025, 1, p. 139-156
|
| Subject: | dynamic programming | Machine Learning and Data Science | Markov decision processes | Künstliche Intelligenz | Artificial intelligence | Dynamische Optimierung | Dynamic programming | Markov-Kette | Markov chain | Theorie | Theory | Lernprozess | Learning process | Mathematische Optimierung | Mathematical programming |
-
Global optimality guarantees for policy gradient methods
Bhandari, Jalaj, (2024)
-
On boundedness of Q-learning iterates for stochastic shortest path problems
Yu, Huizhen, (2013)
-
Bayesian learning of dose-response parameters from a cohort under response-guided dosing
Kotas, Jakob, (2018)
- More ...
-
A policy gradient algorithm for the risk-sensitive exponential cost MDP
Moharrami, Mehrdad, (2025)
-
The power of slightly more than one sample in randomized load balancing
Ying, Lei, (2017)
-
Heavy-traffic insensitive bounds for weighted proportionally fair bandwidth sharing policies
Wang, Weina, (2022)
- More ...