Maximin Action Identification: A New Bandit Framework for Games

Soumis par zenno le lun, 08/22/2016 - 11:33

We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.

Référence Bibliographique:

Conference On Learning Theory n°29 Jun. 2016, ArXiv:1602.04676 hal-01273842

Auteurs:

Aurélien Garivier, Emilie Kaufmann, Wouter M. Koolen

Identifiez-vous pour poster des commentaires

Menu principal

Vous êtes ici

Connexion utilisateur

Maximin Action Identification: A New Bandit Framework for Games