A finite time analysis of temporal difference learning with linear function approximation
Year of publication: |
2021
|
---|---|
Authors: | Bhandari, Jalaj ; Russo, Daniel ; Singal, Raghav |
Published in: |
Operations research. - Catonsville, MD : INFORMS, ISSN 0030-364X, ZDB-ID 123389-0. - Vol. 69.2021, 3, p. 950-973
|
Subject: | reinforcement learning | temporal difference learning | finite time analysis | stochastic gradient descent | Lernprozess | Learning process | Theorie | Theory | Stochastischer Prozess | Stochastic process | Mathematische Optimierung | Mathematical programming | Lernen | Learning |
-
Opportunities for reinforcement learning in stochastic dynamic vehicle routing
Hildebrandt, Florentin D., (2023)
-
Brammer, Janis, (2022)
-
Dynamic stochastic electric vehicle routing with safe reinforcement learning
Basso, Rafael, (2022)
- More ...
-
Global optimality guarantees for policy gradient methods
Bhandari, Jalaj, (2024)
-
On the tightness of an LP relaxation for rational optimization and its applications
Avadhanula, Vashist, (2016)
-
Approximation benefits of policy gradient methods with aggregated states
Russo, Daniel, (2023)
- More ...