|
CMU-CS-06-110
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-06-110
An Asymptotically Optimal Algorithm for the
Max k-Armed Bandit Problem
Matthew J. Streeter, Stephen F. Smith
February 2006
CMU-CS-06-110.ps
CMU-CS-06-110.pdf
Keywords: Multi-armed bandit problem, PAC bounds
We present an asymptotically optimal algorithm for the max variant
of the k-armed bandit problem. Given a set of k slot
machines, each yielding payoff from a fixed (but unknown) distribution,
we wish to allocate trials to the machines so as to maximize the
expected maximum payoff received over a series of n trials.
Subject to certain distributional assumptions, we show that
O(ln(1⁄δ) ln(n)2⁄∈2) trials
are sufficient to identify, with
probability at least 1 − δ , a machine whose expected
maximum payoff is within ∈ of optimal. This result leads
to a strategy for solving the problem that is asymptotically
optimal in the following sense: the gap between the expected maximum
payoff obtained by using our strategy for n trials and that obtained
by pulling the single best arm for all n trials approaches
zero as n → ∞.
18 pages
|