ICML-98 Submission #131

Near-Optimal Reinforcement Learning in Polynomial Time

Authors: Michael Kearns
	 AT&T Labs
	 180 Park Avenue, Room A230
	 Florham Park, NJ 07932
	 mkearns@research.att.com

    	 Satinder Singh
	 Department of Computer Science
	 University of Colorado
	 Boulder, CO 80309-0430
	 baveja@cs.colorado.edu

Abstract

We present new algorithms for reinforcement learning and prove that
they have polynomial bounds on the resources required to achieve
near-optimal return in general Markov decision processes. After
observing that the number of actions required to approach the optimal
return is lower bounded by the mixing time T of the optimal policy (in
the undiscounted case) or by the horizon time T (in the discounted
case), we then give algorithms requiring a number of actions and total
computation time that are only polynomial in T and the number of
states, for both the undiscounted and discounted cases. An interesting
aspect of our algorithms is their explicit handling of the
Exploration-Exploitation trade-off.

These are the first results in the reinforcement learning literature
giving algorithms that provably converge to near-optimal performance
in polynomial time for general Markov decision processes.

Keywords:  Reinforcement learning, Polynomial-time algorithm, 
	   Markov decision processes, 

Contact Author's E-mail: mkearns@research.att.com

Contact Author's phone:  (973) 360-8322