Multi-armed Bandits with Side Constraints
Kamesh Munagala
Computer Science, Duke University
Thursday, September 24, 2009
4:00 PM, 3310 CS
The stochastic multi-armed bandit problem is one of the most
well-studied optimization problems in decision theory. It models the
fundamental trade-off between exploration, or learning about the
performance of various options (or arms), and exploitation, or
utilizing the current best knowledge of the arms. For such problems,
it is desirable to construct "index" policies that assign a numeric
value (or index) to each arm and explore the arm with highest index at
any point in time. For the multi-armed bandit problem, it is well
known (and quite surprising) that such an index policy, termed the
Gittins index policy, is in fact optimal. However, it is also
well-known that such policies become increasingly sub-optimal in the
presence of side-constraints such as costs of switching between arms,
time-varying information, exploration of varying duration, etc. These
side constraints arise frequently in practice, and few good techniques
are available to handle them.
In this talk, we will present a novel, simple, and general algorithmic
technique that yields policies with provable guarantees for bandit
problem with such side-constraints (in particular, switching costs).
Our algorithms are computationally just as efficient as index policies
(and in fact, closely related to such policies), and have the
advantage of yielding approximation guarantees where none existed
before. The talk will be self-contained.
Joint work with Sudipto Guha, UPenn and Peng Shi, Duke.
Speaker Bio: Kamesh Munagala is an Assistant Professor of Computer
Science at Duke University since 2004. He obtained his Ph.D. from
Stanford University in 2003 and B.Tech. degree from IIT Bombay in
1998, both in Computer Science. He spent a year (2003-04) mining gene
expression data as a post-doc at the Department of Biochemistry,
Stanford University School of Medicine. He is a recipient of the NSF
CAREER Award in 2008 and the Alfred P. Sloan Research Fellowship in
2009. He is interested in sequential decision theory, approximation
algorithms, mechanism design, and query processing in database
systems and sensor networks.
Additional CS events.
|