Logarithmic Regret for Episodic Continuous-Time Linear-Quadratic Reinforcement Learning Over a Finite-Time Horizon

We study finite-time horizon continuous-time linear-quadratic reinforcement learning problems in an episodic setting, where both the state and control coefficients are unknown to the controller. We first propose a least-squares algorithm based on continuous-time observations and controls, and establish a logarithmic regret bound of order $O((\ln M)(\ln\ln M))$, with $M$ being the number of learning episodes. The analysis consists of two parts: perturbation analysis, which exploits the regularity and robustness of the associated Riccati differential equation; and parameter estimation error, which relies on sub-exponential properties of continuous-time least-squares estimators. We further propose a practically implementable least-squares algorithm based on discrete-time observations and piecewise constant controls, which achieves similar logarithmic regret with an additional term depending explicitly on the time stepsizes used in the algorithm

MoreLess

Year of publication:	[2021]
Authors:	Basei, Matteo ; Guo, Xin ; Hu, Anran ; Zhang, Yufei
Publisher:	[S.l.] : SSRN
Subject:	Theorie \| Theory \| Lernen \| Learning \| Lernprozess \| Learning process

Extent:	1 Online-Ressource (24 p)
Type of publication:	Book / Working Paper
Language:	English
Notes:	Nach Informationen von SSRN wurde die ursprüngliche Fassung des Dokuments May 18, 2021 erstellt
Other identifiers:	10.2139/ssrn.3848428 [DOI]
Source:	ECONIS - Online Catalogue of the ZBW

Persistent link: https://www.econbiz.de/10013226899