Profitable Bandits

ConferenceName:

Asian Conference on Machine Learning

url:

ACML'18

Edition Number:

Date:

November, 2018

Place:

Beijing, China

Authors:

Mastane Achab

Stéphan Clémençon

Aurélien Garivier

Abstract:

Originally motivated by default risk management applications, this paper
investigates a novel problem, referred to as the profitable bandit problem
here. At each step, an agent chooses a subset of the K possible actions. For
each action chosen, she then receives the sum of a random number of rewards.
Her objective is to maximize her cumulated earnings. We adapt and study three
well-known strategies in this purpose, that were proved to be most efficient in
other settings: kl-UCB, Bayes-UCB and Thompson Sampling. For each of them, we
prove a finite time regret bound which, together with a lower bound we obtain
as well, establishes asymptotic optimality. Our goal is also to compare these
three strategies from a theoretical and empirical perspective both at the same
time. We give simple, self-contained proofs that emphasize their similarities,
as well as their differences. While both Bayesian strategies are automatically
adapted to the geometry of information, the numerical experiments carried out
show a slight advantage for Thompson Sampling in practice.

in Proceedings of Machine Learning Research vol. 95 http://proceedings.mlr.press/v95/

Arxiv Number:

1805.02908

Main menu

Profitable Bandits

Keywords:

Search form

Main menu

You are here

Profitable Bandits

Keywords: