Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement learning
Year of publication: |
2022
|
---|---|
Authors: | Kallus, Nathan ; Uehara, Masatoshi |
Published in: |
Operations research. - Linthicum, Md. : INFORMS, ISSN 1526-5463, ZDB-ID 2019440-7. - Vol. 70.2022, 6, p. 3282-3302
|
Subject: | infinite horizon | Machine Learning and Data Science | Markov decision processes | off-policy evaluation | semiparametric efficiency | Künstliche Intelligenz | Artificial intelligence | Theorie | Theory | Markov-Kette | Markov chain | Entscheidung | Decision | Lernprozess | Learning process | Data-Envelopment-Analyse | Data envelopment analysis | Nichtparametrisches Verfahren | Nonparametric statistics | Lernen | Learning |
-
Bennett, Andrew, (2024)
-
Poisoning finite-horizon Markov decision processes at design time
Caballero, William N., (2021)
-
Offline multi-action policy learning : generalization and optimization
Zhou, Zhengyuan, (2023)
- More ...
-
Fast rates for the regret of offline reinforcement learning
Hu, Yichun, (2025)
-
Kallus, Nathan, (2021)
-
Optimal balancing of time-dependent confounders for marginal structural models
Kallus, Nathan, (2021)
- More ...