ICML-98 Submission #162

Intra-Option Learning about Temporally Abstract Actions

	Richard S. Sutton
	Department of Computer Science
	University of Massachusetts,
	Amherst, MA 01003-4610
	rich@cs.umass.edu

	Doina Precup
	Department of Computer Science
	University of Massachusetts,
	Amherst, MA 01003-4610
	dprecup@cs.umass.edu

	Satinder Singh
	Department of Computer Science
	University of Colorado
        Boulder, CO 80309-0430
	baveja@cs.colorado.edu

Abstract:

Several researchers have proposed modeling temporally abstract actions
in reinforcement learning by the combination of a policy and a
termination condition, which we refer to as an "option".  Value
functions over options and models of options can be learned using
methods designed for semi-Markov decision processes (SMDPs).  However,
these methods all require an option to be executed to termination.  In
this paper we explore methods that learn about an option from small
fragments of experience consistent with that option, even if the
option itself is not executed.  We call these methods "intra-option"
learning methods because they learn from experience within an option.
Intra-option methods are sometimes much more efficient than SMDP
methods because they can use off-policy temporal-difference mechanisms
to learn simultaneously about all the options consistent with an
experience, not just the few that were actually executed. In this
paper we present intra-option learning methods for learning value
functions over options and for learning multi-step models of the
consequences of options.  We present computational examples in which
these new methods learn much faster than SMDP methods and learn
effectively when SMDP methods cannot learn at all.  We also sketch a
convergence proof for intra-option value learning.


Keywords:

	Reinforcement Learning, Temporal Abstraction, Hierarchical Learning
	Semi-Markov Decision Processes, Model Learning

Email address of contact author:

	rich@cs.umass.edu

Phone number of contact author: 978-897-6174