ICML-98 Submission #81

The MAXQ Method for Hierarchical Reinforcement Learning

        Thomas G. Dietterich
        Department of Computer Science
        Oregon State University
        Corvallis, OR 97331

Abstract

    This paper presents a new approach to hierarchical
    reinforcement learning based on the MAXQ decomposition of the value
    function.  The MAXQ decomposition has both a procedural semantics---as
    a subroutine hierarchy---and a declarative semantics---as a
    representation of the value function of a hierarchical policy.  MAXQ
    unifies and extends previous work on hierarchical reinforcement
    learning by Singh, Kaelbling, and Dayan and Hinton.  Conditions under
    which the MAXQ decomposition can represent the optimal value function
    are derived.  The paper defines a hierarchical Q learning algorithm
    and shows experimentally that it can learn much faster than ordinary
    ``flat'' Q learning.  Finally, the paper discusses some interesting
    issues that arise in hierarchical reinforcement learning including the
    hierarchical credit assignment problem and non-hierarchical execution
    of the MAXQ hierarchy.

Keywords:
        Reinforcement learning, Hierarchical reinforcement learning,
        Q learning

Email address of contact author:  tgd@cs.orst.edu
Phone number of contact author:   541-737-5559