ICML-98 Submission #131
Near-Optimal Reinforcement Learning in Polynomial Time
Authors: Michael Kearns
AT&T Labs
180 Park Avenue, Room A230
Florham Park, NJ 07932
mkearns@research.att.com
Satinder Singh
Department of Computer Science
University of Colorado
Boulder, CO 80309-0430
baveja@cs.colorado.edu
Abstract
We present new algorithms for reinforcement learning and prove that
they have polynomial bounds on the resources required to achieve
near-optimal return in general Markov decision processes. After
observing that the number of actions required to approach the optimal
return is lower bounded by the mixing time T of the optimal policy (in
the undiscounted case) or by the horizon time T (in the discounted
case), we then give algorithms requiring a number of actions and total
computation time that are only polynomial in T and the number of
states, for both the undiscounted and discounted cases. An interesting
aspect of our algorithms is their explicit handling of the
Exploration-Exploitation trade-off.
These are the first results in the reinforcement learning literature
giving algorithms that provably converge to near-optimal performance
in polynomial time for general Markov decision processes.
Keywords: Reinforcement learning, Polynomial-time algorithm,
Markov decision processes,
Contact Author's E-mail: mkearns@research.att.com
Contact Author's phone: (973) 360-8322