Neural temporal difference and Q learning provably converge to global optima
Year of publication: |
2024
|
---|---|
Authors: | Cai, Qi ; Yang, Zhuoran ; Lee, Jason D. ; Wang, Zhaoran |
Published in: |
Mathematics of operations research. - Hanover, Md. : INFORMS, ISSN 1526-5471, ZDB-ID 2004273-5. - Vol. 49.2024, 1, p. 619-651
|
Subject: | overparameterized neural network | reinforcement learning | temporal difference learning | Neuronale Netze | Neural networks | Lernprozess | Learning process | Theorie | Theory | Lernen | Learning |
-
Contracts for difference: a reinforcement learning approach
Zengeler, Nico, (2020)
-
A finite time analysis of temporal difference learning with linear function approximation
Bhandari, Jalaj, (2021)
-
Predictive market making via machine learning
Haider, Abbas, (2022)
- More ...
-
Provably efficient reinforcement learning with linear function approximation
Jin, Chi, (2023)
-
Xie, Qiaomin, (2023)
-
A flexible framework for hypothesis testing in high dimensions
Javanmard, Adel, (2020)
- More ...