Neural temporal difference and Q learning provably converge to global optima

Qi Cai, Zhuoran Yang, Jason D. Lee, Zhaoran Wang

Year of publication:	2024
Authors:	Cai, Qi ; Yang, Zhuoran ; Lee, Jason D. ; Wang, Zhaoran
Published in:	Mathematics of operations research. - Hanover, Md. : INFORMS, ISSN 1526-5471, ZDB-ID 2004273-5. - Vol. 49.2024, 1, p. 619-651
Subject:	overparameterized neural network \| reinforcement learning \| temporal difference learning \| Neuronale Netze \| Neural networks \| Lernprozess \| Learning process \| Theorie \| Theory \| Lernen \| Learning

Type of publication:	Article
Type of publication (narrower categories):	Aufsatz in Zeitschrift ; Article in journal
Language:	English
Other identifiers:	10.1287/moor.2023.1370 [DOI]
Source:	ECONIS - Online Catalogue of the ZBW

Persistent link: https://www.econbiz.de/10014527959