Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Date: 
September, 2018
Arxiv Number: 
1602.07182
Hal Number: 
01276324
Abstract: 
We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications. First version: Feb. 2016

Keywords: