Neural temporal difference and Q learning provably converge to global optima
| Year of publication: |
2024
|
|---|---|
| Authors: | Cai, Qi ; Yang, Zhuoran ; Lee, Jason D. ; Wang, Zhaoran |
| Published in: |
Mathematics of operations research. - Hanover, Md. : INFORMS, ISSN 1526-5471, ZDB-ID 2004273-5. - Vol. 49.2024, 1, p. 619-651
|
| Subject: | overparameterized neural network | reinforcement learning | temporal difference learning | Neuronale Netze | Neural networks | Lernprozess | Learning process | Theorie | Theory | Lernen | Learning |
-
Contracts for difference: a reinforcement learning approach
Zengeler, Nico, (2020)
-
A finite time analysis of temporal difference learning with linear function approximation
Bhandari, Jalaj, (2021)
-
Predictive market making via machine learning
Haider, Abbas, (2022)
- More ...
-
Provably efficient reinforcement learning with linear function approximation
Jin, Chi, (2023)
-
Xie, Qiaomin, (2023)
-
STRIDE: a tool-assisted LLM agent framework for strategic and interactive decision-making
Li, Chuanhao, (2024)
- More ...