CMU-CS-06-110
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-06-110

An Asymptotically Optimal Algorithm for the
Max k-Armed Bandit Problem

Matthew J. Streeter, Stephen F. Smith

February 2006

CMU-CS-06-110.ps
CMU-CS-06-110.pdf


Keywords: Multi-armed bandit problem, PAC bounds


We present an asymptotically optimal algorithm for the max variant of the k-armed bandit problem. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the expected maximum payoff received over a series of n trials. Subject to certain distributional assumptions, we show that O(ln(1⁄δ) ln(n)2⁄∈2) trials are sufficient to identify, with probability at least 1 − δ , a machine whose expected maximum payoff is within ∈ of optimal. This result leads to a strategy for solving the problem that is asymptotically optimal in the following sense: the gap between the expected maximum payoff obtained by using our strategy for n trials and that obtained by pulling the single best arm for all n trials approaches zero as n → ∞.

18 pages


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by [email protected]