ICML-98 Submission #81
The MAXQ Method for Hierarchical Reinforcement Learning
Thomas G. Dietterich
Department of Computer Science
Oregon State University
Corvallis, OR 97331
Abstract
This paper presents a new approach to hierarchical
reinforcement learning based on the MAXQ decomposition of the value
function. The MAXQ decomposition has both a procedural semantics---as
a subroutine hierarchy---and a declarative semantics---as a
representation of the value function of a hierarchical policy. MAXQ
unifies and extends previous work on hierarchical reinforcement
learning by Singh, Kaelbling, and Dayan and Hinton. Conditions under
which the MAXQ decomposition can represent the optimal value function
are derived. The paper defines a hierarchical Q learning algorithm
and shows experimentally that it can learn much faster than ordinary
``flat'' Q learning. Finally, the paper discusses some interesting
issues that arise in hierarchical reinforcement learning including the
hierarchical credit assignment problem and non-hierarchical execution
of the MAXQ hierarchy.
Keywords:
Reinforcement learning, Hierarchical reinforcement learning,
Q learning
Email address of contact author: tgd@cs.orst.edu
Phone number of contact author: 541-737-5559