Two-stage index computation for bandits with switching penalties II : switching delays
This paper addresses the multi-armed bandit problem with switching penalties including both costs and delays, extending results of the companion paper [J. Niño-Mora. "Two-Stage Index Computation for Bandits with Switching Penalties I: Switching Costs". Conditionally accepted at INFORMS J. Comp.], which addressed the no switching delays case. Asawa and Teneketzis (1996) introduced an index for bandits with delays that partly characterizes optimal policies, attaching to each bandit state a "continuation index" (its Gittins index) and a "switching index", yet gave no algorithm for it. This paper presents an efficient, decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most (5/2)$n^{3}$+O(n) arithmetic operations for an n -state bandit. The paper exploits the fact that the Asawa and Teneketzis index is the Whittle, or marginal productivity, index of a classic bandit with switching penalties in its semi- Markov restless reformulation, by deploying work-reward analysis and LP-indexability methods introduced by the author. A computational study demonstrates the dramatic runtime savings achieved by the new algorithm, the near-optimality of the index policy, and its substantial gains against a benchmark index policy across a wide instance range.
Year of publication: |
2007-05
|
---|---|
Authors: | Nino-Mora, Jose |
Institutions: | Departamento de Estadistica, Universidad Carlos III de Madrid |
Saved in:
freely available
Saved in favorites
Similar items by person
-
An index for dynamic product promotion and the knapsack problem for perishable items
Jacko, Peter, (2009)
-
Jacko, Peter, (2008)
-
Characterization and computation of restless bandit marginal productivity indices
Nino-Mora, Jose, (2007)
- More ...