# Thesis

I started my PhD in September 2020. I am currently working on Best Arm Identification, one important topic of bandit problems.

The context of bandit problems is the following: consider $K$ distincts probability distributions $\nu_1, \dots, \nu_K$. Those distributions are unknown but at each step you are able to select an arm $1 \leq k \leq K$ and obtain the value of an independent realization of $\nu_k$. You can define the strategy you want (that is to say choose the next arm to observe by using all the previous observations).

There are several mathematical objectives. For instance, in Best Arm Identification, the goal is to identify the best arm, which is the arm with highest associated expectation. There are two settings:

• in the Fixed Confidence setting, you have a confidence level $\delta \in ]0, 1[$ and you need to find a strategy that identify the best arm with probability at least $1-\delta$. The objective is then to minimize the expectation of the number of observations required by the strategy.
• in the Fixed Budget setting, you are given a fixed number of observations $n \in \mathbb{N}^*$ and you have to find a strategy that maximizes the probability of returning the best arm after those observations.

I am working on both settings. For more information about bandit problems the book of Tor Lattimore and Csaba Szepesvári is a good introduction.

# Old projects

HPC resource management improvement using Reinforcement Learning
I used Reinforcement Learning to deal with the problem of resource allocation into HPC clusters during a 4-months internhsip
Random Hyperbolic Graphs
I studied propagation models into Random Hyperbolic Graphs during a 4-months internship