Satisficing in time-sensitive bandit learning
Year of publication: |
2022
|
---|---|
Authors: | Russo, Daniel ; Van Roy, Benjamin |
Published in: |
Mathematics of operations research. - Hanover, Md. : INFORMS, ISSN 1526-5471, ZDB-ID 2004273-5. - Vol. 47.2022, 4, p. 2815-2839
|
Subject: | bandit learning | information theory | online optimization | rate-distortion theory | satisficing | Thompson sampling | Theorie | Theory | Begrenzte Rationalität | Bounded rationality | Lernprozess | Learning process | Lernen | Learning | Stichprobenerhebung | Sampling |
-
Optimistic posterior sampling for reinforcement learning : worst-case regret bounds
Agrawal, Shipra, (2023)
-
Rational Social Learning by Random Sampling
Smith, Lones, (2013)
-
Meta dynamic pricing : transfer learning across experiments
Bastani, Hamsa, (2022)
- More ...
-
Learning to optimize via posterior sampling
Russo, Daniel, (2014)
-
Learning to optimize via information-directed sampling
Russo, Daniel, (2018)
-
Approximation benefits of policy gradient methods with aggregated states
Russo, Daniel, (2023)
- More ...