This page contains information about my actual and old research projects.
I started my PhD in September 2020. I am currently working on Best Arm Identification, one important topic of bandit problems.
The context of bandit problems is the following: consider $K$ distincts probability distributions $\nu_1, \dots, \nu_K$. Those distributions are unknown but at each step you are able to select an arm $1 \leq k \leq K$ and obtain the value of an independent realization of $\nu_k$. You can define the strategy you want (that is to say choose the next arm to observe by using all the previous observations).
There are several mathematical objectives. For instance, in Best Arm Identification, the goal is to identify the best arm, which is the arm with highest associated expectation. There are two settings:
I am working on both settings. For more information about bandit problems the book of Tor Lattimore and Csaba Szepesvári is a good introduction.
We lay the foundations of a non-parametric theory of best-arm identification in multi-armed bandits with a fixed budget T. We consider general, possibly non-parametric, models $\mathcal D$ for distributions over the arms; an overarching example is the model $\mathcal D = \mathcal P[0,1]$ of all probability distributions over $[0,1]$. We propose upper bounds on the average log-probability of misidentifying the optimal arm based on information-theoretic quantities that correspond to infima over Kullback-Leibler divergences between some distributions in D and a given distribution. This is made possible by a refined analysis of the successive-rejects strategy of Audibert, Bubeck, and Munos (2010). We finally provide lower bounds on the same average log-probability, also in terms of the same new information-theoretic quantities; these lower bounds are larger when the (natural) assumptions on the considered strategies are stronger. All these new upper and lower bounds generalize existing bounds based, e.g., on gaps between distributions.