Maximin Action Identification: A New Bandit Framework for Games

ConferenceName:

Conference On Learning Theory

url:

Conference On Learning Theory

Edition Number:

Date:

June, 2016

Place:

New York, USA

PageStart:

1 028

PageEnd:

1 050

Authors:

Aurélien Garivier

Emilie Kaufmann

Wouter M. Koolen

Abstract:

We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.
Emilie's presentation at COLT on youtube

Direct link:

In: Proceedings of of Machine Learning Research vol. 49

Arxiv Number:

1602.04676

Hal Number:

01273842

Main menu

Maximin Action Identification: A New Bandit Framework for Games

Keywords:

Search form

Main menu

You are here

Maximin Action Identification: A New Bandit Framework for Games

Keywords: