KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints

Date:

May, 2018

Arxiv Number:

1805.05071

Hal Number:

01785705

Authors:

Abstract:

In the context of K-armed stochastic bandits with distribution only assumed to be supported by [0,1], we introduce a new algorithm, KL-UCB-switch, and prove that is enjoys simultaneously a distribution-free regret bound of optimal order \sqrt{KT} and a distribution-dependent regret bound of optimal order as well, that is, matching the k ln(T) lower bound by Lai and Robbins and Burnetas and Katehakis.

Keywords:

UCB
Empirical Likelihood
Bandit Problems

Search form

Main menu

You are here

KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints

Keywords: