Online Learning for Dual Index Policies in Dual Sourcing Systems
We consider a periodic review dual sourcing inventory system with a regular source (lower unit cost but longer lead time) and an expedited source (shorter lead time but higher unit cost) under carried-over supply and backlogged demand. Unlike existing literature, we assume that the firm does not have access to the demand distribution a priori and relies solely on past demand realizations. Even with complete information on the demand distribution, it is well-known in the literature that the optimal inventory replenishment policy is complex and state-dependent. Therefore, we focus our attention on a class of popular, easy-to-implement, and near-optimal heuristic policies called the dual-index policy. The performance measure is regret, which is the cost difference between a feasible learning algorithm and the clairvoyant (full-information) benchmark. When the benchmark is chosen to be the (full-information) optimal dual-index policy, we develop a nonparametric online learning algorithm that admits a regret upper bound of $O(\sqrt{T\log T})$, which matches the regret lower bound for any feasible learning algorithms up to a logarithmic factor. Our algorithm integrates stochastic bandits and sample average approximation techniques in an innovative way. As part of our regret analysis, we explicitly prove that the underlying Markov chain is ergodic and converges to its steady state exponentially fast via coupling arguments, which could be of independent interest. Our work provides practitioners with an easy-to-implement, robust, and provably-good online decision support system for managing a dual-sourcing inventory system
Year of publication: |
[2023]
|
---|---|
Authors: | Tang, Jingwen ; Chen, Boxiao ; Shi, Cong |
Publisher: |
[S.l.] : SSRN |
Saved in:
freely available
Saved in favorites
Similar items by person
-
Online learning for dual-index policies in dual-sourcing systems
Tang, Jingwen, (2024)
-
Offline Personalized Pricing with Censored Demand
Qi, Zhengling, (2022)
-
Online Learning and Matching for Multiproduct Systems with General Upgrading
Tang, Jingwen, (2022)
- More ...