Learning to optimize via posterior sampling
Year of publication: |
2014
|
---|---|
Authors: | Russo, Daniel ; Van Roy, Benjamin |
Published in: |
Mathematics of operations research. - Catonsville, MD : INFORMS, ISSN 0364-765X, ZDB-ID 195683-8. - Vol. 39.2014, 4, p. 1221-1243
|
Subject: | online optimization | multiarmed bandits | Thompson sampling | Stichprobenerhebung | Sampling | Theorie | Theory | Lernprozess | Learning process | Markov-Kette | Markov chain | E-Learning | E-learning |
-
The online shortest path problem : learning travel times using a multiarmed bandit framework
Lagos, Tomás, (2025)
-
Satisficing in time-sensitive bandit learning
Russo, Daniel, (2022)
-
Optimistic posterior sampling for reinforcement learning : worst-case regret bounds
Agrawal, Shipra, (2023)
- More ...
-
Learning to optimize via information-directed sampling
Russo, Daniel, (2018)
-
Satisficing in time-sensitive bandit learning
Russo, Daniel, (2022)
-
Approximation benefits of policy gradient methods with aggregated states
Russo, Daniel, (2023)
- More ...