Online action learning in high dimensions: A new exploration rule for contextual et-greedy heuristics

Bandit problems are pervasive in various fields of research and are also present in several practical applications. Examples, including dynamic pricing and assortment and the design of auctions and incentives, permeate a large number of sequential treatment experiments. Different applications impose distinct levels of restrictions on viable actions. Some favor diversity of outcomes, while others require harmful actions to be closely monitored or mainly avoided. In this paper, we extend one of the most popular bandit solutions, the original et-greedy heuristics, to high-dimensional contexts. Moreover, we introduce a competing exploration mechanism that counts with searching sets based on order statistics. We view our proposals as alternatives for cases where pluralism is valued or, in the opposite direction, cases where the end-user should carefully tune the range of exploration of new actions. We find reasonable bounds for the cumulative regret of a decaying et-greedy heuristic in both cases, and we provide an upper bound for the initialization phase that implies the regret bounds when order statistics are considered to be at most equal but mostly better than the case when random searching is the sole exploration mechanism. Additionally, we show that endusers have sufficient exibility to avoid harmful actions since any cardinality for the higher-order statistics can be used to achieve stricter upper bound. In a simulation exercise, we show that the algorithms proposed in this paper outperform simple and adapted counterparts.

MoreLess

Year of publication:	2020
Authors:	Flores, Claudio C. ; Medeiros, Marcelo C.
Publisher:	Rio de Janeiro : Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio), Departamento de Economia
Subject:	Bandit \| sequential treatment \| high dimensions \| LASSO \| regret

More details

Series:	Texto para discussão ; 674
Type of publication:	Book / Working Paper
Type of publication (narrower categories):	Working Paper
Language:	English
Other identifiers:	1734830778 [GVK] hdl:10419/249722 [Handle] RePEc:rio:texdis:674 [RePEc]
Source:	EconStor

Persistent link: https://www.econbiz.de/10012817064