ICML-98 Submission #23
Title:
An analysis of Actor/Critic Algorithms using Eligibility Traces:
Reinforcement Learning with Imperfect Value Functions
Authors with addresses:
Hajime Kimura, Shigenobu Kobayashi
Tokyo Institute of Technology
Department of Computational Intelligence and Systems Science,
Interdisciplinary Graduate School of Science and Engineering,
4259, Nagatsuta, Midori-ku, Yokohama, 226-8502 JAPAN
Abstract:
We present an analysis of an actor/critic algorithm, in which
the actor updates its policy using eligibility traces of the
policy parameters. Most of the theoretical results for
eligibility traces have been for only critic's value iteration
algorithms. This paper investigates what the actor's eligibility
trace does. The result shows the algorithm is an extension of
Williams' REINFORCE algorithms for infinite horizon
reinforcement tasks, and then the critic plays a role of
providing an appropriate reinforcement baseline for the actor.
Thanks to the actor's eligibility trace, the actor improves
its policy by using a gradient of actual cumulative discounted
reward in the training sequence, not by using a gradient of
the estimated value function in the critic. It enables the
agent to learn a fairly good policy under the condition that
the approximated value function in the critic is hopelessly
inaccurate for conventional actor/critic algorithms without
the actor's eligibility traces. Also, if an accurate value
function is estimated by the critic, the actor's learning is
dramatically accelerated in our test cases. The behavior of
the algorithm is demonstrated through simulations of a linear
quadratic control problem and a pole balancing problem.
Keywords: reinforcement learning, actor/critic architecture,
inaccurate value function, discounted reward,
eligibility trace, reinforcement baseline,
stochastic policy, gradient ascent
Email address of contact author: gen@fe.dis.titech.ac.jp
Email address of contact author: kobayasi@dis.titech.ac.jp
Phone number of contact author: +81-45-924-5544