De l'intérêt des méthodes séquentielles (une introduction)

Context:

Toulouse, rencontres

Resume:

We study the problem of minimising regret in two-armed bandit problems with Gaussian noise. Our
objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed
by exploitation are necessarily suboptimal.
The results hold regardless of whether or not the difference in means between the two arms is known.
Besides the main message, we also refine existing concentration results, which allow us to design fully sequential strategies with finite-time regret
guarantees that are (a) asymptotically optimal as the horizon grows and (b) order-optimal in the minimax sense. Furthermore
we provide empirical evidence that the theory also holds in practice and discuss extensions to non-gaussian and multiple-armed case.

(Joint work with Emilie Kaufmann and Tor Lattimore)

Date:

May, 2016

Event url:

ANR SPADRO

Keywords:

Bandit Problems

Search form

Main menu

You are here

De l'intérêt des méthodes séquentielles (une introduction)

Keywords: