ICML-98 Submission #162
Intra-Option Learning about Temporally Abstract Actions
Richard S. Sutton
Department of Computer Science
University of Massachusetts,
Amherst, MA 01003-4610
rich@cs.umass.edu
Doina Precup
Department of Computer Science
University of Massachusetts,
Amherst, MA 01003-4610
dprecup@cs.umass.edu
Satinder Singh
Department of Computer Science
University of Colorado
Boulder, CO 80309-0430
baveja@cs.colorado.edu
Abstract:
Several researchers have proposed modeling temporally abstract actions
in reinforcement learning by the combination of a policy and a
termination condition, which we refer to as an "option". Value
functions over options and models of options can be learned using
methods designed for semi-Markov decision processes (SMDPs). However,
these methods all require an option to be executed to termination. In
this paper we explore methods that learn about an option from small
fragments of experience consistent with that option, even if the
option itself is not executed. We call these methods "intra-option"
learning methods because they learn from experience within an option.
Intra-option methods are sometimes much more efficient than SMDP
methods because they can use off-policy temporal-difference mechanisms
to learn simultaneously about all the options consistent with an
experience, not just the few that were actually executed. In this
paper we present intra-option learning methods for learning value
functions over options and for learning multi-step models of the
consequences of options. We present computational examples in which
these new methods learn much faster than SMDP methods and learn
effectively when SMDP methods cannot learn at all. We also sketch a
convergence proof for intra-option value learning.
Keywords:
Reinforcement Learning, Temporal Abstraction, Hierarchical Learning
Semi-Markov Decision Processes, Model Learning
Email address of contact author:
rich@cs.umass.edu
Phone number of contact author: 978-897-6174