Logarithmic Regret for Episodic Continuous-Time Linear-Quadratic Reinforcement Learning Over a Finite-Time Horizon

We study finite-time horizon continuous-time linear-quadratic reinforcement learning problems in an episodic setting, where both the state and control coefficients are unknown to the controller. We first propose a least-squares algorithm based on continuous-time observations and controls, and establish a logarithmic regret bound of order $O((\ln M)(\ln\ln M))$, with $M$ being the number of learning episodes. The analysis consists of two parts: perturbation analysis, which exploits the regularity and robustness of the associated Riccati differential equation; and parameter estimation error, which relies on sub-exponential properties of continuous-time least-squares estimators. We further propose a practically implementable least-squares algorithm based on discrete-time observations and piecewise constant controls, which achieves similar logarithmic regret with an additional term depending explicitly on the time stepsizes used in the algorithm

MoreLess

Year of publication:	[2021]
Authors:	Basei, Matteo ; Guo, Xin ; Hu, Anran ; Zhang, Yufei
Publisher:	[S.l.] : SSRN
Subject:	Theorie \| Theory \| Lernen \| Learning \| Lernprozess \| Learning process

freely available

Full text |

More access options

doi.org

Check Google Scholar

In German libraries (KVK)

I need help

Which error do you want to report?

Report: URL of this record is broken

Please note: Sometimes more than one URL is available for a resource. Please click on "More Options" to display these URLs

Report: Addition of an URL to that resource

Please note: Sometimes more than one URL is available for a resource. Please click on "More Options" to display these URLs

Report: The language(s) of that resource is/are…

Please note: Required language not in the list? Please use comment field

Report: Addition of an identifier to that resource

Report: report typo

Report: Add statement

Report: Add keywords

Please use the commentary box to describe other errors

Would you like to receive a feedback?

Commentary

Please type the code from the image

For general questions please use this address.

Persistent link: https://www.econbiz.de/10013226899