List of Publications

Some bibtex entries of my papers (until mid-2019) are available on this page.
Google scholar page.

Aug. 2024
Keywords:
Johnson-Lidenstrauss, Random Projections, Sparsity
We provide a simple proof of the Johnson-Lindenstrauss lemma for sub-Gaussian variables. We extend the analysis to identify how sparse projections can be, and what the cost of sparsity is on the target dimension. The Johnson-Lindenstrauss lemma is the theoretical core of the dimensionality reduction methods based on random projections. While its original formulation involves matrices with Gaussian entries, the computational cost of random projections can be drastically reduced by the use of simpler variables, especially if they vanish with a high probability. In this paper, we propose a simple and elementary analysis of random projections under classical assumptions that emphasizes the key role of sub-Gaussianity. Furthermore, we show how to extend it to sparse projections, emphasizing the limits induced by the sparsity of the data itself.
Elise Crépon, Aurélien Garivier, Wouter M Koolen
Apr. 2024
Keywords:
Bandit models, Best-arm identification, Multi-objective

We study the problem of sequential learning of the Pareto front in multi-objective multi-armed bandits. An agent is faced with K possible arms to pull. At each turn she picks one, and receives a vector-valued reward. When she thinks she has enough information to identify the Pareto front of the different arm means, she stops the game and gives an answer. We are interested in designing algorithms such that the answer given is correct with probability at least 1-δ. Our main contribution is an efficient implementation of an algorithm achieving the optimal sample complexity when the risk δ is small. With K arms in d dimensions p of which are in the Pareto set, the algorithm runs in time O(Kp^d) per round.

Apr. 2024
Keywords:
Optimization, Auctions
Motivated by programmatic advertising optimization, we consider the task of sequentially allocating budget across a set of resources. At every time step, a feasible allocation is chosen and only a corresponding random return is observed. The goal is to maximize the cumulative expected sum of returns. This is a realistic model for budget allocation across subdivisions of marketing campaigns, with the objective of maximizing the number of conversions. We study direct search (also known as pattern search) methods for linearly constrained and derivative-free optimization in the presence of noise, which apply in particular to sequential budget allocation. These algorithms, which do not rely on hierarchical partitioning of the resource space, are easy to implement; they respect the operational constraints of resource allocation by avoiding evaluation outside of the feasible domain; and, they are also compatible with warm start by being (approximate) descent algorithms. However, they have not yet been analyzed from the perspective of cumulative regret. We show that direct search methods achieves finite regret in the deterministic and unconstrained case. In the presence of evaluation noise and linear constraints, we propose a simple extension of direct search that achieves a regret upper-bound of the order of T^(2/3). We also propose an accelerated version of the algorithm, relying on repeated sequential testing, that significantly improves the practical behavior of the approach.
2023
Aug. 2023
Keywords:
Reinforcement learning, MDP, risk-aware optimization

Abstract What are the functionals of the reward that can be computed and optimized exactly in Markov Decision Processes? In the finite-horizon, undiscounted setting, Dynamic Programming (DP) can only handle these operations efficiently for certain classes of statistics. We summarize the characterization of these classes for policy evaluation, and give a new answer for the planning problem. Interestingly, we prove that only generalized means can be optimized exactly, even in the more general framework of Distributional Reinforcement Learning (DistRL). DistRL permits, however, to evaluate other functionals approximately. We provide error bounds on the resulting estimators, and discuss the potential of this approach as well as its limitations. These results contribute to advancing the theory of Markov Decision Processes by examining overall characteristics of the return, and particularly risk-conscious strategies.

Aadil Oufkir, Omar Fawzi, Nicolas Flammarion, Aurélien Garivier
Sep. 2023
Keywords:
sequential statistics, quantum channels

Can adaptive strategies outperform non-adaptive ones for quantum hypothesis selection? We exhibit problems where adaptive strategies provably reduce the number of required samples by a factor four in the worst case, and possibly more when the actual difficulty of the problem makes it possible. In addition, we exhibit specific hypotheses classes for which there is a provable polynomial separation between adaptive and non-adaptive strategies -- a specificity of the quantum framework that does not appear in classical testing.

Aug. 2023
Keywords:
Differential Privacy, Non-parametrics

We study non-parametric density estimation for densities in Lipschitz and Sobolev spaces, and under global privacy. In particular, we investigate regimes where the privacy budget is not supposed to be constant. We consider the classical definition of global differential privacy, but also the more recent notion of global concentrated differential privacy. We recover the result of Barber & Duchi (2014) stating that histogram estimators are optimal against Lipschitz distributions for the L2 risk, and under regular differential privacy, and we extend it to other norms and notions of privacy. Then, we investigate higher degrees of smoothness, drawing two conclusions: First, and contrary to what happens with constant privacy budget (Wasserman & Zhou, 2010), there are regimes where imposing privacy degrades the regular minimax risk of estimation on Sobolev densities. Second, so-called projection estimators are near-optimal against the same classes of densities in this new setup with pure differential privacy, but contrary to the constant privacy budget case, it comes at the cost of relaxation. With zero concentrated differential privacy, there is no need for relaxation, and we prove that the estimation is optimal.

Jul. 2023
Keywords:
Differential Privacy, Inverse Sensibility

This work studies the estimation of many statistical quantiles under differential privacy. More precisely, given a distribution and access to i.i.d. samples from it, we study the estimation of the inverse of its cumulative distribution function (the quantile function) at specific points. For instance, this task is of key importance in private data generation. We present two different approaches. The first one consists in privately estimating the empirical quantiles of the samples and using this result as an estimator of the quantiles of the distribution. In particular, we study the statistical properties of the recently published algorithm introduced by Kaplan et al. 2022 that privately estimates the quantiles recursively. The second approach is to use techniques of density estimation in order to uniformly estimate the quantile function on an interval. In particular, we show that there is a tradeoff between the two methods. When we want to estimate many quantiles, it is better to estimate the density rather than estimating the quantile function at specific points.

Aadil Oufkir, Omar Fawzi, Nicolas Flammarion, Aurélien Garivier
n° 36
COLT
Jul. 2023
Keywords:
Quantum Physics, Sequential statistics, Information Theory

In the problem of quantum channel certification, we have black box access to a quantum process and would like to decide if this process matches some predefined specification or is ε-far from this specification. The objective is to achieve this task while minimizing the number of times the black box is used. Here, we focus on optimal incoherent strategies for two relevant extreme cases of channel certification. The first one is when the predefined specification is a unitary channel, e.g., a gate in a quantum circuit. In this case, we show that testing whether the black box is described by a fixed unitary operator in dimension d or ε-far from it in the trace norm requires Θ(d/ε^2) uses of the black box. The second setting we consider is when the predefined specification is a completely depolarizing channel with input dimension din and output dimension dout. In this case, we prove that, in the non-adaptive setting, Θ(d_in^2 d_out^1.5 / ε^2) uses of the channel are necessary and sufficient to verify whether it is equal to the depolarizing channel or ε-far from it in the diamond norm. Finally, we prove a lower bound of Ω(d_in^2d_out/ε^2) for this problem in the adaptive setting. Note that the special case d_in=1 corresponds to the well-studied quantum state certification problem.

Apr. 2023
Keywords:
Differential Privacy, Inverse Sensibility

Producing statistics that respect the privacy of the samples while still maintaining their accuracy is an important topic of research. We study minimax lower bounds when the class of estimators is restricted to the differentially private ones. In particular, we show that characterizing the power of a distributional test under differential privacy can be done by solving a transport problem. With specific coupling constructions, this observation allows us to derivate Le Cam-type and Fano-type inequalities for both regular definitions of differential privacy and for divergence-based ones (based on Renyi divergence). We then proceed to illustrate our results on three simple, fully worked out examples. In particular, we show that the problem class has a huge importance on the provable degradation of utility due to privacy. For some problems, privacy leads to a provable degradation only when the rate of the privacy parameters is small enough whereas for other problem, the degradation systematically occurs under much looser hypotheses on the privacy parametters. Finally, we show that the known privacy guarantees of DP-SGLD, a private convex solver, when used to perform maximum likelihood, leads to an algorithm that is near-minimax optimal in both the sample size and the privacy tuning parameters of the problem for a broad class of parametric estimation procedures that includes exponential families.

accepted
Jul. 2023
Keywords:
Differential Privacy, Quantile Estimation, Inverse Sensibility

We address the differentially private estimation of multiple quantiles (MQ) of a dataset, a key building block in modern data analysis. We apply the recent non-smoothed Inverse Sensitivity (IS) mechanism to this specific problem and establish that the resulting method is closely related to the current state-of-the-art, the JointExp algorithm, sharing in particular the same computational complexity and a similar efficiency. However, we demonstrate both theoretically and empirically that (non-smoothed) JointExp suffers from an important lack of performance in the case of peaked distributions, with a potentially catastrophic impact in the presence of atoms. While its smoothed version would allow to leverage the performance guarantees of IS, it remains an open challenge to implement. As a proxy to fix the problem we propose a simple and numerically efficient method called Heuristically Smoothed JointExp (HSJointExp), which is endowed with performance guarantees for a broad class of distributions and achieves results that are orders of magnitude better on problematic datasets.

Feb. 2023
Keywords:
Bandit models, Best-arm identification, fixed budget

We lay the foundations of a non-parametric theory of best-arm identification in multi-armed bandits with a fixed budget T. We consider general, possibly non-parametric, models D for distributions over the arms; an overarching example is the model D = P(0,1) of all probability distributions over [0,1]. We propose upper bounds on the average log-probability of misidentifying the optimal arm based on information-theoretic quantities that correspond to infima over Kullback-Leibler divergences between some distributions in D and a given distribution. This is made possible by a refined analysis of the successive-rejects strategy of Audibert, Bubeck, and Munos (2010). We finally provide lower bounds on the same average log-probability, also in terms of the same new information-theoretic quantities; these lower bounds are larger when the (natural) assumptions on the considered strategies are stronger. All these new upper and lower bounds generalize existing bounds based, e.g., on gaps between distributions.

Sep. 2022
Keywords:
Bandit models, Best-arm identification,

We consider the problem introduced by [MJTN20] of identifying all the ε-optimal arms in a finite stochastic multi-armed bandit with Gaussian rewards. In the fixed confidence setting, we give a lower bound on the number of samples required by any algorithm that returns the set of ε-good arms with a failure probability less than some risk level δ. This bound writes as T * ε (µ) log(1/δ), where T * ε (µ) is a characteristic time that depends on the vector of mean rewards µ and the accuracy parameter ε. We also provide an efficient numerical method to solve the convex max-min program that defines the characteristic time. Our method is based on a complete characterization of the alternative bandit instances that the optimal sampling strategy needs to rule out, thus making our bound tighter than the one provided by [MJTN20]. Using this method, we propose a Track-and-Stop algorithm that identifies the set of ε-good arms w.h.p and enjoys asymptotic optimality (when δ goes to zero) in terms of the expected sample complexity. Finally, using numerical simulations, we demonstrate our algorithm's advantage over state-of-the-art methods, even for moderate values of the risk parameter.

May 2022
Keywords:
CDR réalistes, Réseaux cellulaires, Simulation

Du fait de la confidentialité des données que contiennent les traces cellulaires CDRs (Call Data Records), leur obtention est soumise à des lois strictes qui limitent leur accessibilité aux chercheurs, et leur utilisabilité même lorsqu’ils sont disponibles. Pour pallier à ce manque, dans cet article, nous nous intéressons à la production de traces CDRs réalistes sur la base d’un ensemble de caractérisations de CDRs réels obtenus auprès d’un opérateur.

Mar. 2022
Keywords:

We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance. This strategy called Exploration-Biased Sampling is not only asymptotically optimal: we also prove non-asymptotic bounds occurring with high probability. To the best of our knowledge, this is the first strategy with such guarantees. But the main advantage over other algorithms like Track-and-Stop is an improved behavior regarding exploration: Exploration-Biased Sampling is slightly biased in favor of exploration in a subtle but natural way that makes it more stable and interpretable. These improvements are allowed by a new analysis of the sample complexity optimization problem, which yields a faster numerical resolution scheme and several quantitative regularity results that we believe of high independent interest.

Yoan Russac, Christina Katsimerou, Dennis Bohle, Olivier Cappé, Aurélien Garivier, Wouter M Koolen
Dec. 2021
Keywords:
Bandit models, Best-arm identification, A/B testing

Motivated by A/B/n testing applications, we consider a finite set of distributions (called \emph{arms}), one of which is treated as a \emph{control}. We assume that the population is stratified into homogeneous subpopulations. At every time step, a subpopulation is sampled and an arm is chosen: the resulting observation is an independent draw from the arm conditioned on the subpopulation. The quality of each arm is assessed through a weighted combination of its subpopulation means. We propose a strategy for sequentially choosing one arm per time step so as to discover as fast as possible which arms, if any, have higher weighted expectation than the control. This strategy is shown to be asymptotically optimal in the following sense: if is the first time when the strategy ensures that it is able to output the correct answer with probability at least , then grows linearly with at the exact optimal rate. This rate is identified in the paper in three different settings: (1) when the experimenter does not observe the subpopulation information, (2) when the subpopulation of each sample is observed but not chosen, and (3) when the experimenter can select the subpopulation from which each response is sampled. We illustrate the efficiency of the proposed strategy with numerical simulations on synthetic and real data collected from an A/B/n experiment.

Aymen Al Marjani, Aurélien Garivier, Alexandre Proutiere
Dec. 2021
Keywords:
Reinforcement learning, Best policy identification, Markov Decision Processes
We investigate the classical active pure exploration problem in Markov Decision Processes, where the agent sequentially selects actions and, from the resulting system trajectory, aims at identifying the best policy as fast as possible. We propose an information-theoretic lower bound on the average number of steps required before a correct answer can be given with probability at least . This lower bound involves a non-convex optimization problem, for which we propose a convex relaxation. We further provide an algorithm whose sample complexity matches the relaxed lower bound up to a factor . This algorithm addresses general communicating MDPs; we propose a variant with reduced exploration rate (and hence faster convergence) under an additional ergodicity assumption. This work extends previous results relative to the \emph{generative setting}~\cite{marjani2020adaptive}, where the agent could at each step observe the random outcome of any (state, action) pair. In contrast, we show here how to deal with the \emph{navigation constraints} and comment on the gap between the generative and online settings. Our analysis relies on an ergodic theorem for non-homogeneous Markov chains which we consider of wide interest in the analysis of Markov Decision Processes.
Aadil Oufkir, Omar Fawzi, Nicolas Flammarion, Aurélien Garivier
spotlight
Dec. 2021
Keywords:

What advantage do \emph{sequential} procedures provide over batch algorithms for testing properties of unknown distributions? Focusing on the problem of testing whether two distributions $\mathcal{D}_1$ and $\mathcal{D}_2$ on $\{1,\dots, n\}$ are equal or $\eps$-far, we give several answers to this question. We show that for a small alphabet size $n$, there is a sequential algorithm that outperforms any batch algorithm by a factor of at least $4$ in terms sample complexity. For a general alphabet size $n$, we give a sequential algorithm that uses no more samples than its batch counterpart, and possibly fewer if the actual distance $\TV(\mathcal{D}_1, \mathcal{D}_2)$ between $\mathcal{D}_1$ and $\mathcal{D}_2$ is larger than $\eps$. As a corollary, letting $\eps$ go to $0$, we obtain a sequential algorithm for testing closeness when no a priori bound on $\TV(\mathcal{D}_1, \mathcal{D}_2)$ is given that has a sample complexity $\tilde{\mathcal{O}}(\frac{n^{2/3}}{\TV(\mathcal{D}_1, \mathcal{D}_2)^{4/3}})$: this improves over the $\tilde{\mathcal{O}}(\frac{n/\log n}{\TV(\mathcal{D}_1, \mathcal{D}_2)^{2} })$ tester of~\citet{daskalakis2017optimal} and is optimal up to multiplicative constants. We also establish limitations of sequential algorithms for the problem of testing closeness: they can improve the worst case number of samples by at most a constant factor.

n°13
Nov. 2021
Keywords:
Bandits, Auctions
First-price auctions have largely replaced traditional bidding approaches based on Vickrey auctions in programmatic advertising. As far as learning is concerned, first-price auctions are more challenging because the optimal bidding strategy does not only depend on the value of the item but also requires some knowledge of the other bids. They have already given rise to several works in sequential learning, many of which consider models for which the value of the buyer or the opponents' maximal bid is chosen in an adversarial manner. Even in the simplest settings, this gives rise to algorithms whose regret grows as $\sqrt{T}$ with respect to the time horizon $T$. Focusing on the case where the buyer plays against a stationary stochastic environment, we show how to achieve significantly lower regret: when the opponents' maximal bid distribution is known we provide an algorithm whose regret can be as low as $\log^2(T)$; in the case where the distribution must be learnt sequentially, a generalization of this algorithm can achieve $T^{1/3+ \epsilon}$ regret, for any $\epsilon>0$. To obtain these results, we introduce two novel ideas that can be of interest in their own right. First, by tranposing results obtained in the posted price setting, we provide conditions under which the first-price biding utility is locally quadratic around its optimum. Second, we leverage the observation that, on small sub-intervals, the concentration of the variations of the empirical distribution function may be controlled more accurately than by using the classical Dvoretzky-Kiefer-Wolfowitz inequality. Numerical simulations confirm that our algorithms converge much faster than alternatives proposed in the literature for various bid distributions, including for bids collected on an actual programmatic advertising platform.
n°30
Aug. 2021
Keywords:
Bandits, Graphs, Best-arm identification
We propose an analysis of PAC identification of an epsilon-best arm in graph bandit models with Gaussian distributions. We consider finite but potentially very large bandit models where the set of arms is endowed with a graph structure, and we assume that the arms' expectations $\bmu$ are smooth with respect to this graph. Our goal is to identify an arm whose expectation is at most $\epsilon$ below the largest of all means. We focus on the fixed-confidence setting: given a risk parameter $\delta$, we consider sequential strategies that yield an epsilon-optimal arm with probability at least 1-delta. All such strategies use at least T*_{R,epsilon}(mu)\log(1/delta) samples, where R is the smoothness parameter. We identify the complexity term T*_{R,epsilon}(mu) as the solution of a min-max problem for which we give a game-theoretic analysis and an approximation procedure. This procedure is the key element required by the asymptotically optimal Track-and-Stop strategy.
n°32
Mar. 2021
Keywords:

Developing efficient sequential bidding strategies for repeated auctions is an important practical challenge in various marketing tasks. In this setting, the bidding agent obtains information, on both the value of the item at sale and the behavior of the other bidders, only when she wins the auction. Standard bandit theory does not apply to this problem due to the presence of action-dependent censoring. In this work, we consider second-price auctions and propose novel, efficient UCB-like algorithms for this task. These algorithms are analyzed in the stochastic setting, assuming regularity of the distribution of the opponents' bids. We provide regret upper bounds that quantify the improvement over the baseline algorithm proposed in the literature. The improvement is particularly significant in cases when the value of the auctioned item is low, yielding a spectacular reduction in the order of the worst-case regret. We further provide the first parametric lower bound for this problem that applies to generic UCB-like strategies. As an alternative, we propose more explainable strategies which are reminiscent of the Explore Then Commit bandit algorithm. We provide a critical analysis of this class of strategies, showing both important advantages and limitations. In particular, we provide a minimax lower bound and propose a nearly minimax-optimal instance of this class.

Apr. 2021
Keywords:
Non-stationary Bandits
Contextual sequential decision problems with categorical or numerical observations are ubiquitous and Generalized Linear Bandits (GLB) offer a solid theoretical framework to address them. In contrast to the case of linear bandits, existing algorithms for GLB have two drawbacks undermining their applicability. First, they rely on excessively pessimistic concentration bounds due to the non-linear nature of the model. Second, they require either non-convex projection steps or burn-in phases to enforce boundedness of the estimators. Both of these issues are worsened when considering non-stationary models, in which the GLB parameter may vary with time. In this work, we focus on self-concordant GLB (which include logistic and Poisson regression) with forgetting achieved either by the use of a sliding window or exponential weights. We propose a novel confidence-based algorithm for the maximum-likehood estimator with forgetting and analyze its perfomance in abruptly changing environments. These results as well as the accompanying numerical simulations highlight the potential of the proposed approach to address non-stationarity in GLB.
Aug. 2020
206 (pp. 1-22)
Keywords:

We propose a novel algorithm for unsupervised graph representation learning with attributed graphs. It combines three advantages addressing some current limitations of the literature: i) The model is inductive: it can embed new graphs without re-training in the presence of new data; ii) The method takes into account both micro-structures and macro-structures by looking at the attributed graphs at different scales; iii) The model is end-to-end differentiable: it is a building block that can be plugged into deep learning pipelines and allows for back-propagation. We show that combining a coarsening method having strong theoretical guarantees with mutual information maximization suffices to produce high quality embeddings. We evaluate them on classification tasks with common benchmarks of the literature. We show that our algorithm is competitive with state of the art among unsupervised graph representation learning methods.

n°29
Jul. 2020
pp.2200-2226
Keywords:
Bandits, Graphs, Best-arm identification
We study best-arm identification with fixed confidence in bandit models with graph smoothness constraint. We provide and analyze an efficient gradient ascent algorithm to compute the sample complexity of this problem as a solution of a non-smooth max-min problem (providing in passing a simplified analysis for the unconstrained case). Building on this algorithm, we propose an asymptotically optimal strategy. We furthermore illustrate by numerical experiments both the strategy's efficiency and the impact of the smoothness constraint on the sample complexity. Best Arm Identification (BAI) is an important challenge in many applications ranging from parameter tuning to clinical trials. It is now very well understood in vanilla bandit models, but real-world problems typically involve some dependency between arms that requires more involved models. Assuming a graph structure on the arms is an elegant practical way to encompass this phenomenon, but this had been done so far only for regret minimization. Addressing BAI with graph constraints involves delicate optimization problems for which the present paper offers a solution.
Jan. 2020
Keywords:

The statistical framework of Generalized Linear Models (GLM) can be applied to sequential problems involving categorical or ordinal rewards associated, for instance, with clicks, likes or ratings. In the example of binary rewards, logistic regression is well-known to be preferable to the use of standard linear modeling. Previous works have shown how to deal with GLMs in contextual online learning with bandit feedback when the environment is assumed to be stationary. In this paper, we relax this latter assumption and propose two upper confidence bound based algorithms that make use of either a sliding window or a discounted maximum-likelihood estimator. We provide theoretical guarantees on the behavior of these algorithms for general context sequences and in the presence of abrupt changes. These results take the form of high probability upper bounds for the dynamic regret that are of order d^2/3 G^1/3 T^2/3 , where d, T and G are respectively the dimension of the unknown parameter, the number of rounds and the number of breakpoints up to time T. The empirical performance of the algorithms is illustrated in simulated environments.

n°24
Mar. 2020
pp 138--147
Keywords:
In this paper we propose a perfect simulation algorithm for the Exponential Random Graph Model, based on the Coupling From The Past method of Propp & Wilson (1996). We use a Glauber dynamics to... more
(online)
Aug. 2020
pp.38--61
Keywords:

We analyze the sample complexity of the thresholding bandit problem, with and without the assumption that the mean values of the arms are increasing. In each case, we provide a lower bound valid... more

vol. 40
Mar. 2021
pp. 61-96, 18th Wald Prize
Keywords:
In this paper, we study sequential testing problems with overlapping hypotheses. We first focus on the simple problem of assessing if the mean µ of a Gaussian distribution is ≥ε−or≤ε; if µ∈(−ε,ε),... more
Julia code for best-arm identification
n°11
JMLR: Workshop and Conference Proceedings
101
Nov. 2019
252-267
Keywords:

We propose and analyze StoROO, an algorithm for risk optimization on stochastic black-box functions derived from StoOO. Motivated by risk-averse decision making fields like agriculture, medicine,... more

Vol. 13 (55)
Dec. 2019 (first version in Feb. 2016)
pp.1-40
This paper is devoted to the estimation of conditional quantile, more precisely the quantile of the output of a real stochastic code whose inputs are in R d. In this purpose, we introduce a... more
201(C)
2020
Keywords:
We report on an empirical study of the main strategies for conditional quantile estimation in the context of stochastic computer experiments. To ensure adequate diversity, six metamodels are... more
vol.
6
(3)
Dec. 2018
pp.9-31
Keywords:
Associant données massives (big data) et algorithmes d'apprentissage automatique (machine learning), la puissance des outils de décision automatique suscite autant d'espoir que de craintes... more
n°32
Dec. 2018
Keywords:

(poster) Learning the minimum/maximum mean among a finite set of distributions is a fundamental sub-task in planning, game tree search... more

n°10
Nov. 2018
Keywords:

Originally motivated by default risk management applications, this paper
investigates a novel problem, referred to as the profitable bandit problem
here. At each step, an agent chooses... more

May 2019
Keywords:
We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties... more
n°2 018
Aug. 2018

In the last decade digital media: web or app publishers, generalised the use of real-time ad auctions to sell their ad-spaces. Multiple auction platforms, also called Supply Side Platforms (SSP)... more

23 (179)
Aug. 2022
pp.1−66
In the context of K-armed stochastic bandits with distribution only assumed to be supported by [0,1], we introduce a new algorithm, KL-UCB-switch, and prove that is enjoys simultaneously a... more
vol.
60
Dec. 2017
pp.114-131
Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, partly because of applications including online content optimization. This... more
n°28
Oct. 2017
pp.223-237
Keywords:

We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the... more

n°2 017
Sep. 2017
Keywords:

This paper is devoted to the study of the max K-armed bandit problem, which consists in sequentially allocating resources in order to detect extreme values. Our contribution is twofold.
We... more

Mar. 2017
Quel est le point commun entre un site de vente sur Internet, un système de navigation GPS, un essai clinique et un moteur de publicité en ligne ? Petit article de 2 pages pour présenter le domaine... more
Mar. 2017
Recommender systems (RS) aim at recommending either one or several items simultaneously : the former type of RS is named single play RS, while the latter is named multiple-play RS. The... more
n°30
Dec. 2016

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration... more

n°18
Jul. 2016
Keywords:

Un nombre croissant de systèmes numériques font appel à des algortihmes de bandits pour combiner efficacement exploration de l'environnement et exploitation de l'information accumulée. Les modèles... more

n°29
Jun. 2016
pp.1 028-1 050

We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample... more

vol.
4
(1)
Mar. 2016
pp.76-108
Keywords:
In this work, we extend some parameters built on a probability distribution introduced before to the case where the proximity between real numbers is measured by using a Bregman divergence. This... more
vol.
17
(1)
Jan. 2016
pp.1-42
The stochastic multi-armed bandit model is a simple abstraction that has proven useful in many different contexts in statistics and machine learning. Whereas the achievable limit in terms of regret... more
vol.
51
Oct. 2015
Depuis 1996, les Journées du groupe Modélisation Aléatoire et Statistique (MAS) de la SMAI réunissent tous les deux ans une large part des communautés probabilistes et statistiques françaises. L... more
n°2 015
Jun. 2015
Keywords:

Les systèmes de recommandation à très grande échelle sont aujourd'hui omniprésents sur internet : ouvrages conseillés à l'achat dans les librairies en ligne, articles recommandés sur les... more

n°28
May. 2015
pp.67-72

For several web tasks such as ad placement or e-commerce, recommender systems must recommend multiple items to their users---such problems can be modeled as bandits with multiple plays. State-of-... more

n°12
Mar. 2015
pp.73-88

The multiple-play recommender systems (RS) are RS which recommend several items to the users. RS are based on learning models in order to choose the items to recommend. Among these models, the... more

vol.
46
(2)
Mar. 2015
pp.300-319
We describe a new algorithm for the perfect simulation of variable length Markov chains and random systems with perfect connections. This algorithm generalizes Propp and Wilson's simulation... more
vol.
18
Feb. 2015
pp.59-79
Keywords:
Les systèmes de recommandation (SR) à tirages multiples font référence aux SR recommandant plusieurs objets en même temps aux utilisateurs. La plupart des SR s'appuient sur des modèles d'... more
n°2 014
Jun. 2014
Keywords:

Après obtention de leur DUT, de nombreux étudiants des IUT STID souhaitent poursuivre leurs études pour obtenir un Master qui leur donne la possibilité de travailler comme ingénieur-expert ou chef... more

n°27
Jun. 2014
pp.461-481

A/B testing refers to the task of determining the best option among two alternatives that yield random outcomes. We provide distribution-dependent lower bounds for the performance of A/B testing... more

vol.
19
(3)
Mar. 2014
pp.93-105
Keywords:
The rapid evolution of information systems managing more and more voluminous data has caused profound paradigm shifts in the job of statistician, becoming successively data miner, bioinformatician... more
Sep. 2013
pp.489-493

We present deviation bounds for self-normalized averages and applications to estimation with a random number of observations.
The results rely on a peeling argument in exponential martingale... more

vol.
41
(3)
Jun. 2013
pp.1 516-1 541
We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper... more
vol.
40
(2)
Jun. 2013
pp.344-362
We study a problem of model selection for data produced by two different context tree sources. Motivated by linguistic questions, we consider the case where the probabilistic context trees... more
vol.
14
Feb. 2013
pp.601-623
Keywords:
We consider an original problem that arises from the issue of security analysis of a power system and that we name optimal discovery with probabilistic expert advice. We address it with an algorithm... more
n°51
Dec. 2012
pp.6 808-6 812
Keywords:

Motivated by issues of security analysis for power systems, we analyze a new problem, called optimal discovery with probabilistic expert advice. We address it with an algorithm based on the... more

n°15
Apr. 2012
pp.592-600

Stochastic bandit problems have been analyzed from two different perspectives: a frequentist view, where the parameter is a deterministic unknown quantity, and a Bayesian approach, where the... more

vol.
21
(6)
Dec. 2011
pp.2 109-2 145
Computing smoothing distributions, the distributions of one or more states conditional on past, present, and future observations is a recurring problem when operating on general hidden Markov models... more
vol.
121
(11)
Nov. 2011
pp.2 488-2 506
Context tree models have been introduced by Rissanen as a parsimonious generalization of Markov models. Since then, they have been widely used in applied probability and statistics. The present paper... more
n°22
Oct. 2011
pp.174-188

Many problems, such as cognitive radio, parameter control of a scanning tunnelling microscope or internet advertisement, can be modelled as non-stationary bandit problems where the distributions... more

n°24
Jul. 2011
pp.359-376

This paper presents a finite-time analysis of the KL-UCB algorithm, an online, horizon-free index policy for stochastic bandit problems. We prove two distinct results: first, for arbitrary bounded... more

vol.
5
(1)
Feb. 2011
pp.68-76
We consider the task of optimally sensing a two-state Markovian channel with an observation cost and without any prior information regarding the channel's transition probabilities. This task is... more
n°24
Dec. 2010

We consider structured multi-armed bandit problems based on the Generalized Linear Model (GLM) framework of statistics. For these bandits, we propose a new algorithm, called GLM-UCB. We derive... more

n°48
Sep. 2010
pp.115-122

We consider model-based reinforcement learning in finite Markov Decision Processes (MDPs), focussing on so-called optimistic strategies. In MDPs, optimism can be implemented by carrying out... more

Jun. 2010
Keywords:

Nous considérons des problèmes de bandits multibras structurés dans lesquels l’agent est guidé par une connaissance a priori sur la structure de la récompense qui peut être exploitée de manière à... more

vol.
57
(11)
Nov. 2009
pp.4 247-4 259
We discuss approximate maximum-likelihood methods for blind identification and deconvolution. These algorithms are based on particle approximation versions of the expectation-maximization (EM)... more
vol.
11
(4)
Oct. 2009
pp.634-642
Keywords:
We show that the maximin average redundancy in pattern coding is eventually larger than 1.84 (n/log n)1/3 for messages of length n. This improves recent results on pattern redundancy, although it... more
n°15
Aug. 2009
pp.465-468
Keywords:

This paper is devoted to extend the regenerative block-bootstrap (RBB) proposed for regenerative Markov chains to Hidden Markov Models {(Xn, Yn)}. In the HMM setup, regeneration times of the... more

vol.
139
(3)
Mar. 2009
pp.962-977
We address the issue of order identification for hidden Markov models with Poisson and Gaussian emissions. We prove information-theoretic BIC-like mixture inequalities in the spirit of Finesso [1991... more
vol.
55
(1)
Jan. 2009
pp.358-373
This paper describes universal lossless coding strategies for compressing sources on countably infinite alphabets. Classes of memoryless sources defined by an envelope condition on the marginal... more
n°9
Jun. 2008
pp.639-643

We discuss approximate maximum likelihood methods for blind identification and deconvolution. These algorithms are based on particle approximation versions of the EM algorithm. We consider two... more

n°9
Jun. 2008
pp.634-638

Particle filtering has been successfully used to approximate the fixed-lag or fixed-interval smoothing distributions in digital communication and to perform approximate maximum likelihood... more

vol.
52
(12)
Dec. 2006
pp.5 579-5 586
The Context Tree Weighting method (CTW) is shown to be almost adaptive on the classes of renewal and Markov renewal processes. Up to logarithmic factor, ctw achieves the minimax pointwise redundancy... more
vol.
52
(10)
Oct. 2006
pp.4 630-4 635
The Bayesian information criterion (BIC) and Krichevsky- Trofimov (KT) version of minimum description length (MDL) principle are popular in the study of model selection. For order estimation of... more
Rapport de DEA (Master's dissertation)
Sep. 2003
Keywords:
Codage universel sans perte par transformée grammaticale.
n°8
Apr. 2002
pp.47-56

Events in self-timed rings can propagate evenly spaced or as bursts. By studying these phenomena, we obtain a better understanding of the underlying dynamics of self-timed pipelines, which is a... more

preprint
We introduce a general approach to prove oracle properties in context tree selection. The results derive from a concentration condition that is verified, for example, by mixing processes. Moreover,... more