Relative Q-Learning for Average-Reward Markov Decision Processes with Continuous States

We propose an online algorithm for solving a class of average-reward Markov decision processes with continuous state spaces in a model-free setting. The algorithm combines the classical relative Q-learning with an asynchronous averaging procedure, which permits the Q-value estimate at a state-action pair to be updated based on observations at other neighboring pairs sampled in subsequent iterations. These point estimates are then retained and used for constructing an interpolation-based function approximator that predicts the Q-function values at unexplored state-action pairs. We show that with probability one the sequence of function approximators converges to the optimal Q-function up to a constant. Numerical results on a simple benchmark example are reported to illustrate the algorithm

MoreLess

Year of publication:	[2021]
Authors:	Yang, Xiangyu ; Hu, Jiaqiao ; Hu, Jianqiang
Publisher:	[S.l.] : SSRN
Subject:	Entscheidung \| Decision \| Markov-Kette \| Markov chain \| Theorie \| Theory

Extent:	1 Online-Ressource (32 p)
Type of publication:	Book / Working Paper
Language:	English
Notes:	Nach Informationen von SSRN wurde die ursprüngliche Fassung des Dokuments December 25, 2021 erstellt
Other identifiers:	10.2139/ssrn.3993508 [DOI]
Classification:	C61 - Optimization Techniques; Programming Models; Dynamic Analysis ; C63 - Computational Techniques
Source:	ECONIS - Online Catalogue of the ZBW

Persistent link: https://www.econbiz.de/10013309947