Computer Sciences Dept.

Multi-armed Bandits with Side Constraints

Kamesh Munagala
Computer Science, Duke University
Thursday, September 24, 2009
4:00 PM, 3310 CS

The stochastic multi-armed bandit problem is one of the most well-studied optimization problems in decision theory. It models the fundamental trade-off between exploration, or learning about the performance of various options (or arms), and exploitation, or utilizing the current best knowledge of the arms. For such problems, it is desirable to construct "index" policies that assign a numeric value (or index) to each arm and explore the arm with highest index at any point in time. For the multi-armed bandit problem, it is well known (and quite surprising) that such an index policy, termed the Gittins index policy, is in fact optimal. However, it is also well-known that such policies become increasingly sub-optimal in the presence of side-constraints such as costs of switching between arms, time-varying information, exploration of varying duration, etc. These side constraints arise frequently in practice, and few good techniques are available to handle them.

In this talk, we will present a novel, simple, and general algorithmic technique that yields policies with provable guarantees for bandit problem with such side-constraints (in particular, switching costs). Our algorithms are computationally just as efficient as index policies (and in fact, closely related to such policies), and have the advantage of yielding approximation guarantees where none existed before. The talk will be self-contained.

Joint work with Sudipto Guha, UPenn and Peng Shi, Duke.

Speaker Bio: Kamesh Munagala is an Assistant Professor of Computer Science at Duke University since 2004. He obtained his Ph.D. from Stanford University in 2003 and B.Tech. degree from IIT Bombay in 1998, both in Computer Science. He spent a year (2003-04) mining gene expression data as a post-doc at the Department of Biochemistry, Stanford University School of Medicine. He is a recipient of the NSF CAREER Award in 2008 and the Alfred P. Sloan Research Fellowship in 2009. He is interested in sequential decision theory, approximation algorithms, mechanism design, and query processing in database systems and sensor networks.



Additional CS events.

 
Computer Science | UW Home