Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling

ConferenceName:

Neural Information Processing Systems

Edition Number:

Date:

December, 2018

Authors:

Emilie Kaufmann

Wouter M. Koolen

Aurélien Garivier

Abstract:

(poster) Learning the minimum/maximum mean among a finite set of distributions is a fundamental sub-task in planning, game tree search and reinforcement learning. We formalize this learning task as the problem of sequentially testing how the minimum mean among a finite set of distributions compares to a given threshold. We develop refined non-asymptotic lower bounds, which show that optimality mandates very different sampling behavior for a low vs high true minimum. We show that Thompson Sampling and the intuitive Lower Confidence Bounds policy each nail only one of these cases. We develop a novel approach that we call Murphy Sampling. Even though it entertains exclusively low true minima, we prove that MS is optimal for both possibilities. We then design advanced self-normalized deviation inequalities, fueling more aggressive stopping rules. We complement our theoretical guarantees by experiments showing that MS works best in practice.

Direct link:

available on the website of neurips

Arxiv Number:

1806.00973

Hal Number:

01804581

Main menu

Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling

Keywords:

Search form

Main menu

You are here

Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling

Keywords: