List of Publications

Some bibtex entries of my papers (until mid-2019) are available on this page.
Google scholar page.

Mar. 2021

Developing efficient sequential bidding strategies for repeated auctions is an important practical challenge in various marketing tasks. In this setting, the bidding agent obtains information, on both the value of the item at sale and the behavior of the other bidders, only when she wins the auction. Standard bandit theory does not apply to this problem due to the presence of action-dependent censoring. In this work, we consider second-price auctions and propose novel, efficient UCB-like algorithms for this task. These algorithms are analyzed in the stochastic setting, assuming regularity of the distribution of the opponents' bids. We provide regret upper bounds that quantify the improvement over the baseline algorithm proposed in the literature. The improvement is particularly significant in cases when the value of the auctioned item is low, yielding a spectacular reduction in the order of the worst-case regret. We further provide the first parametric lower bound for this problem that applies to generic UCB-like strategies. As an alternative, we propose more explainable strategies which are reminiscent of the Explore Then Commit bandit algorithm. We provide a critical analysis of this class of strategies, showing both important advantages and limitations. In particular, we provide a minimax lower bound and propose a nearly minimax-optimal instance of this class.

Sept. 2020
Non-stationary Bandits
Contextual sequential decision problems with categorical or numerical observations are ubiquitous and Generalized Linear Bandits (GLB) offer a solid theoretical framework to address them. In contrast to the case of linear bandits, existing algorithms for GLB have two drawbacks undermining their applicability. First, they rely on excessively pessimistic concentration bounds due to the non-linear nature of the model. Second, they require either non-convex projection steps or burn-in phases to enforce boundedness of the estimators. Both of these issues are worsened when considering non-stationary models, in which the GLB parameter may vary with time. In this work, we focus on self-concordant GLB (which include logistic and Poisson regression) with forgetting achieved either by the use of a sliding window or exponential weights. We propose a novel confidence-based algorithm for the maximum-likehood estimator with forgetting and analyze its perfomance in abruptly changing environments. These results as well as the accompanying numerical simulations highlight the potential of the proposed approach to address non-stationarity in GLB.
Algorithms 13(9)
Aug. 2020
206 (pp. 1-22)

We propose a novel algorithm for unsupervised graph representation learning with attributed graphs. It combines three advantages addressing some current limitations of the literature: i) The model is inductive: it can embed new graphs without re-training in the presence of new data; ii) The method takes into account both micro-structures and macro-structures by looking at the attributed graphs at different scales; iii) The model is end-to-end differentiable: it is a building block that can be plugged into deep learning pipelines and allows for back-propagation. We show that combining a coarsening method having strong theoretical guarantees with mutual information maximization suffices to produce high quality embeddings. We evaluate them on classification tasks with common benchmarks of the literature. We show that our algorithm is competitive with state of the art among unsupervised graph representation learning methods.

Jul. 2020
Bandits, Graphs, Best-arm identification
We study best-arm identification with fixed confidence in bandit models with graph smoothness constraint. We provide and analyze an efficient gradient ascent algorithm to compute the sample complexity of this problem as a solution of a non-smooth max-min problem (providing in passing a simplified analysis for the unconstrained case). Building on this algorithm, we propose an asymptotically optimal strategy. We furthermore illustrate by numerical experiments both the strategy's efficiency and the impact of the smoothness constraint on the sample complexity. Best Arm Identification (BAI) is an important challenge in many applications ranging from parameter tuning to clinical trials. It is now very well understood in vanilla bandit models, but real-world problems typically involve some dependency between arms that requires more involved models. Assuming a graph structure on the arms is an elegant practical way to encompass this phenomenon, but this had been done so far only for regret minimization. Addressing BAI with graph constraints involves delicate optimization problems for which the present paper offers a solution.
Jan. 2020

The statistical framework of Generalized Linear Models (GLM) can be applied to sequential problems involving categorical or ordinal rewards associated, for instance, with clicks, likes or ratings. In the example of binary rewards, logistic regression is well-known to be preferable to the use of standard linear modeling. Previous works have shown how to deal with GLMs in contextual online learning with bandit feedback when the environment is assumed to be stationary. In this paper, we relax this latter assumption and propose two upper confidence bound based algorithms that make use of either a sliding window or a discounted maximum-likelihood estimator. We provide theoretical guarantees on the behavior of these algorithms for general context sequences and in the presence of abrupt changes. These results take the form of high probability upper bounds for the dynamic regret that are of order d^2/3 G^1/3 T^2/3 , where d, T and G are respectively the dimension of the unknown parameter, the number of rounds and the number of breakpoints up to time T. The empirical performance of the algorithms is illustrated in simulated environments.

Mar. 2020
pp 138--147
In this paper we propose a perfect simulation algorithm for the Exponential Random Graph Model, based on the Coupling From The Past method of Propp & Wilson (1996). We use a Glauber dynamics to... more
Apr. 2020

We analyze the sample complexity of the thresholding bandit problem, with and without the assumption that the mean values of the arms are increasing. In each case, we provide a lower bound valid... more

May. 2019
In this paper, we study sequential testing problems with overlapping hypotheses. We first focus on the simple problem of assessing if the mean µ of a Gaussian distribution is ≥ε−or≤ε; if µ∈(−ε,ε),... more
Julia code for best-arm identification
JMLR: Workshop and Conference Proceedings
Nov. 2019

We propose and analyze StoROO, an algorithm for risk optimization on stochastic black-box functions derived from StoOO. Motivated by risk-averse decision making fields like agriculture, medicine,... more

Vol. 13 (55)
Dec. 2019 (first version in Feb. 2016)
This paper is devoted to the estimation of conditional quantile, more precisely the quantile of the output of a real stochastic code whose inputs are in R d. In this purpose, we introduce a... more
We report on an empirical study of the main strategies for conditional quantile estimation in the context of stochastic computer experiments. To ensure adequate diversity, six metamodels are... more
Dec. 2018
Associant données massives (big data) et algorithmes d'apprentissage automatique (machine learning), la puissance des outils de décision automatique suscite autant d'espoir que de craintes... more
Dec. 2018

(poster) Learning the minimum/maximum mean among a finite set of distributions is a fundamental sub-task in planning, game tree search... more

Nov. 2018

Originally motivated by default risk management applications, this paper
investigates a novel problem, referred to as the profitable bandit problem
here. At each step, an agent chooses... more

May 2019
We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties... more
n°2 018
Aug. 2018

In the last decade digital media: web or app publishers, generalised the use of real-time ad auctions to sell their ad-spaces. Multiple auction platforms, also called Supply Side Platforms (SSP)... more

May. 2018
In the context of K-armed stochastic bandits with distribution only assumed to be supported by [0,1], we introduce a new algorithm, KL-UCB-switch, and prove that is enjoys simultaneously a... more
Dec. 2017
Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, partly because of applications including online content optimization. This... more
Oct. 2017

We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the... more

n°2 017
Sep. 2017

This paper is devoted to the study of the max K-armed bandit problem, which consists in sequentially allocating resources in order to detect extreme values. Our contribution is twofold.
We... more

Mar. 2017
Quel est le point commun entre un site de vente sur Internet, un système de navigation GPS, un essai clinique et un moteur de publicité en ligne ? Petit article de 2 pages pour présenter le domaine... more
Mar. 2017
Recommender systems (RS) aim at recommending either one or several items simultaneously : the former type of RS is named single play RS, while the latter is named multiple-play RS. The... more
Dec. 2016

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration... more

Jul. 2016

Un nombre croissant de systèmes numériques font appel à des algortihmes de bandits pour combiner efficacement exploration de l'environnement et exploitation de l'information accumulée. Les modèles... more

Jun. 2016
pp.1 028-1 050

We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample... more

Mar. 2016
In this work, we extend some parameters built on a probability distribution introduced before to the case where the proximity between real numbers is measured by using a Bregman divergence. This... more
Jan. 2016
The stochastic multi-armed bandit model is a simple abstraction that has proven useful in many different contexts in statistics and machine learning. Whereas the achievable limit in terms of regret... more
Oct. 2015
Depuis 1996, les Journées du groupe Modélisation Aléatoire et Statistique (MAS) de la SMAI réunissent tous les deux ans une large part des communautés probabilistes et statistiques françaises. L... more
n°2 015
Jun. 2015

Les systèmes de recommandation à très grande échelle sont aujourd'hui omniprésents sur internet : ouvrages conseillés à l'achat dans les librairies en ligne, articles recommandés sur les... more

May. 2015

For several web tasks such as ad placement or e-commerce, recommender systems must recommend multiple items to their users---such problems can be modeled as bandits with multiple plays. State-of-... more

Mar. 2015

The multiple-play recommender systems (RS) are RS which recommend several items to the users. RS are based on learning models in order to choose the items to recommend. Among these models, the... more

Mar. 2015
We describe a new algorithm for the perfect simulation of variable length Markov chains and random systems with perfect connections. This algorithm generalizes Propp and Wilson's simulation... more
Feb. 2015
Les systèmes de recommandation (SR) à tirages multiples font référence aux SR recommandant plusieurs objets en même temps aux utilisateurs. La plupart des SR s'appuient sur des modèles d'... more
n°2 014
Jun. 2014

Après obtention de leur DUT, de nombreux étudiants des IUT STID souhaitent poursuivre leurs études pour obtenir un Master qui leur donne la possibilité de travailler comme ingénieur-expert ou chef... more

Jun. 2014

A/B testing refers to the task of determining the best option among two alternatives that yield random outcomes. We provide distribution-dependent lower bounds for the performance of A/B testing... more

Mar. 2014
The rapid evolution of information systems managing more and more voluminous data has caused profound paradigm shifts in the job of statistician, becoming successively data miner, bioinformatician... more
Sep. 2013

We present deviation bounds for self-normalized averages and applications to estimation with a random number of observations.
The results rely on a peeling argument in exponential martingale... more

Jun. 2013
pp.1 516-1 541
We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper... more
Jun. 2013
We study a problem of model selection for data produced by two different context tree sources. Motivated by linguistic questions, we consider the case where the probabilistic context trees... more
Feb. 2013
We consider an original problem that arises from the issue of security analysis of a power system and that we name optimal discovery with probabilistic expert advice. We address it with an algorithm... more
Dec. 2012
pp.6 808-6 812

Motivated by issues of security analysis for power systems, we analyze a new problem, called optimal discovery with probabilistic expert advice. We address it with an algorithm based on the... more

Apr. 2012

Stochastic bandit problems have been analyzed from two different perspectives: a frequentist view, where the parameter is a deterministic unknown quantity, and a Bayesian approach, where the... more

Dec. 2011
pp.2 109-2 145
Computing smoothing distributions, the distributions of one or more states conditional on past, present, and future observations is a recurring problem when operating on general hidden Markov models... more
Nov. 2011
pp.2 488-2 506
Context tree models have been introduced by Rissanen as a parsimonious generalization of Markov models. Since then, they have been widely used in applied probability and statistics. The present paper... more
Oct. 2011

Many problems, such as cognitive radio, parameter control of a scanning tunnelling microscope or internet advertisement, can be modelled as non-stationary bandit problems where the distributions... more

Jul. 2011

This paper presents a finite-time analysis of the KL-UCB algorithm, an online, horizon-free index policy for stochastic bandit problems. We prove two distinct results: first, for arbitrary bounded... more

Feb. 2011
We consider the task of optimally sensing a two-state Markovian channel with an observation cost and without any prior information regarding the channel's transition probabilities. This task is... more
Dec. 2010

We consider structured multi-armed bandit problems based on the Generalized Linear Model (GLM) framework of statistics. For these bandits, we propose a new algorithm, called GLM-UCB. We derive... more

Sep. 2010

We consider model-based reinforcement learning in finite Markov Decision Processes (MDPs), focussing on so-called optimistic strategies. In MDPs, optimism can be implemented by carrying out... more

Jun. 2010

Nous considérons des problèmes de bandits multibras structurés dans lesquels l’agent est guidé par une connaissance a priori sur la structure de la récompense qui peut être exploitée de manière à... more

Nov. 2009
pp.4 247-4 259
We discuss approximate maximum-likelihood methods for blind identification and deconvolution. These algorithms are based on particle approximation versions of the expectation-maximization (EM)... more
Oct. 2009
We show that the maximin average redundancy in pattern coding is eventually larger than 1.84 (n/log n)1/3 for messages of length n. This improves recent results on pattern redundancy, although it... more
Aug. 2009

This paper is devoted to extend the regenerative block-bootstrap (RBB) proposed for regenerative Markov chains to Hidden Markov Models {(Xn, Yn)}. In the HMM setup, regeneration times of the... more

Mar. 2009
We address the issue of order identification for hidden Markov models with Poisson and Gaussian emissions. We prove information-theoretic BIC-like mixture inequalities in the spirit of Finesso [1991... more
Jan. 2009
This paper describes universal lossless coding strategies for compressing sources on countably infinite alphabets. Classes of memoryless sources defined by an envelope condition on the marginal... more
Jun. 2008

We discuss approximate maximum likelihood methods for blind identification and deconvolution. These algorithms are based on particle approximation versions of the EM algorithm. We consider two... more

Jun. 2008

Particle filtering has been successfully used to approximate the fixed-lag or fixed-interval smoothing distributions in digital communication and to perform approximate maximum likelihood... more

Dec. 2006
pp.5 579-5 586
The Context Tree Weighting method (CTW) is shown to be almost adaptive on the classes of renewal and Markov renewal processes. Up to logarithmic factor, ctw achieves the minimax pointwise redundancy... more
Oct. 2006
pp.4 630-4 635
The Bayesian information criterion (BIC) and Krichevsky- Trofimov (KT) version of minimum description length (MDL) principle are popular in the study of model selection. For order estimation of... more
Rapport de DEA (Master's dissertation)
Sep. 2003
Codage universel sans perte par transformée grammaticale.
Apr. 2002

Events in self-timed rings can propagate evenly spaced or as bursts. By studying these phenomena, we obtain a better understanding of the underlying dynamics of self-timed pipelines, which is a... more

We introduce a general approach to prove oracle properties in context tree selection. The results derive from a concentration condition that is verified, for example, by mixing processes. Moreover,... more