Some bibtex entries of my papers (until mid-2019) are available on this page.
Google scholar page.
Developing efficient sequential bidding strategies for repeated auctions is an important practical challenge in various marketing tasks. In this setting, the bidding agent obtains information, on both the value of the item at sale and the behavior of the other bidders, only when she wins the auction. Standard bandit theory does not apply to this problem due to the presence of action-dependent censoring. In this work, we consider second-price auctions and propose novel, efficient UCB-like algorithms for this task. These algorithms are analyzed in the stochastic setting, assuming regularity of the distribution of the opponents' bids. We provide regret upper bounds that quantify the improvement over the baseline algorithm proposed in the literature. The improvement is particularly significant in cases when the value of the auctioned item is low, yielding a spectacular reduction in the order of the worst-case regret. We further provide the first parametric lower bound for this problem that applies to generic UCB-like strategies. As an alternative, we propose more explainable strategies which are reminiscent of the Explore Then Commit bandit algorithm. We provide a critical analysis of this class of strategies, showing both important advantages and limitations. In particular, we provide a minimax lower bound and propose a nearly minimax-optimal instance of this class.
We propose a novel algorithm for unsupervised graph representation learning with attributed graphs. It combines three advantages addressing some current limitations of the literature: i) The model is inductive: it can embed new graphs without re-training in the presence of new data; ii) The method takes into account both micro-structures and macro-structures by looking at the attributed graphs at different scales; iii) The model is end-to-end differentiable: it is a building block that can be plugged into deep learning pipelines and allows for back-propagation. We show that combining a coarsening method having strong theoretical guarantees with mutual information maximization suffices to produce high quality embeddings. We evaluate them on classification tasks with common benchmarks of the literature. We show that our algorithm is competitive with state of the art among unsupervised graph representation learning methods.
The statistical framework of Generalized Linear Models (GLM) can be applied to sequential problems involving categorical or ordinal rewards associated, for instance, with clicks, likes or ratings. In the example of binary rewards, logistic regression is well-known to be preferable to the use of standard linear modeling. Previous works have shown how to deal with GLMs in contextual online learning with bandit feedback when the environment is assumed to be stationary. In this paper, we relax this latter assumption and propose two upper confidence bound based algorithms that make use of either a sliding window or a discounted maximum-likelihood estimator. We provide theoretical guarantees on the behavior of these algorithms for general context sequences and in the presence of abrupt changes. These results take the form of high probability upper bounds for the dynamic regret that are of order d^2/3 G^1/3 T^2/3 , where d, T and G are respectively the dimension of the unknown parameter, the number of rounds and the number of breakpoints up to time T. The empirical performance of the algorithms is illustrated in simulated environments.
We analyze the sample complexity of the thresholding bandit problem, with and without the assumption that the mean values of the arms are increasing. In each case, we provide a lower bound valid... more
We propose and analyze StoROO, an algorithm for risk optimization on stochastic black-box functions derived from StoOO. Motivated by risk-averse decision making fields like agriculture, medicine,... more
Originally motivated by default risk management applications, this paper
investigates a novel problem, referred to as the profitable bandit problem
here. At each step, an agent chooses... more
In the last decade digital media: web or app publishers, generalised the use of real-time ad auctions to sell their ad-spaces. Multiple auction platforms, also called Supply Side Platforms (SSP)... more
We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the... more
This paper is devoted to the study of the max K-armed bandit problem, which consists in sequentially allocating resources in order to detect extreme values. Our contribution is twofold.
We... more
We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration... more
Un nombre croissant de systèmes numériques font appel à des algortihmes de bandits pour combiner efficacement exploration de l'environnement et exploitation de l'information accumulée. Les modèles... more
We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample... more
Julia code of the simulations on github, updated code for newer versions of Julia
We provide a complete characterization of the complexity of best-arm identification in... more
Les systèmes de recommandation à très grande échelle sont aujourd'hui omniprésents sur internet : ouvrages conseillés à l'achat dans les librairies en ligne, articles recommandés sur les... more
For several web tasks such as ad placement or e-commerce, recommender systems must recommend multiple items to their users---such problems can be modeled as bandits with multiple plays. State-of-... more
The multiple-play recommender systems (RS) are RS which recommend several items to the users. RS are based on learning models in order to choose the items to recommend. Among these models, the... more
Après obtention de leur DUT, de nombreux étudiants des IUT STID souhaitent poursuivre leurs études pour obtenir un Master qui leur donne la possibilité de travailler comme ingénieur-expert ou chef... more
A/B testing refers to the task of determining the best option among two alternatives that yield random outcomes. We provide distribution-dependent lower bounds for the performance of A/B testing... more
We present deviation bounds for self-normalized averages and applications to estimation with a random number of observations.
The results rely on a peeling argument in exponential martingale... more
Motivated by issues of security analysis for power systems, we analyze a new problem, called optimal discovery with probabilistic expert advice. We address it with an algorithm based on the... more
Stochastic bandit problems have been analyzed from two different perspectives: a frequentist view, where the parameter is a deterministic unknown quantity, and a Bayesian approach, where the... more
Many problems, such as cognitive radio, parameter control of a scanning tunnelling microscope or internet advertisement, can be modelled as non-stationary bandit problems where the distributions... more
This paper presents a finite-time analysis of the KL-UCB algorithm, an online, horizon-free index policy for stochastic bandit problems. We prove two distinct results: first, for arbitrary bounded... more
We consider structured multi-armed bandit problems based on the Generalized Linear Model (GLM) framework of statistics. For these bandits, we propose a new algorithm, called GLM-UCB. We derive... more
We consider model-based reinforcement learning in finite Markov Decision Processes (MDPs), focussing on so-called optimistic strategies. In MDPs, optimism can be implemented by carrying out... more
Nous considérons des problèmes de bandits multibras structurés dans lesquels l’agent est guidé par une connaissance a priori sur la structure de la récompense qui peut être exploitée de manière à... more
This paper is devoted to extend the regenerative block-bootstrap (RBB) proposed for regenerative Markov chains to Hidden Markov Models {(Xn, Yn)}. In the HMM setup, regeneration times of the... more
We discuss approximate maximum likelihood methods for blind identification and deconvolution. These algorithms are based on particle approximation versions of the EM algorithm. We consider two... more
Particle filtering has been successfully used to approximate the fixed-lag or fixed-interval smoothing distributions in digital communication and to perform approximate maximum likelihood... more
Events in self-timed rings can propagate evenly spaced or as bursts. By studying these phenomena, we obtain a better understanding of the underlying dynamics of self-timed pipelines, which is a... more