ICML-98 Submission #23

Title: An analysis of Actor/Critic Algorithms using Eligibility Traces: Reinforcement Learning with Imperfect Value Functions

Authors with addresses:

Hajime Kimura, Shigenobu Kobayashi
Tokyo Institute of Technology
Department of Computational Intelligence and Systems Science,
Interdisciplinary Graduate School of Science and Engineering,
4259, Nagatsuta, Midori-ku, Yokohama, 226-8502 JAPAN

Abstract:

    We present an analysis of an actor/critic algorithm, in which the actor updates its policy using eligibility traces of the policy parameters. Most of the theoretical results for eligibility traces have been for only critic's value iteration algorithms. This paper investigates what the actor's eligibility trace does. The result shows the algorithm is an extension of Williams' REINFORCE algorithms for infinite horizon reinforcement tasks, and then the critic plays a role of providing an appropriate reinforcement baseline for the actor. Thanks to the actor's eligibility trace, the actor improves its policy by using a gradient of actual cumulative discounted reward in the training sequence, not by using a gradient of the estimated value function in the critic. It enables the agent to learn a fairly good policy under the condition that the approximated value function in the critic is hopelessly inaccurate for conventional actor/critic algorithms without the actor's eligibility traces. Also, if an accurate value function is estimated by the critic, the actor's learning is dramatically accelerated in our test cases. The behavior of the algorithm is demonstrated through simulations of a linear quadratic control problem and a pole balancing problem.
Keywords: reinforcement learning, actor/critic architecture, inaccurate value function, discounted reward, eligibility trace, reinforcement baseline, stochastic policy, gradient ascent

Email address of contact author: gen@fe.dis.titech.ac.jp
Email address of contact author: kobayasi@dis.titech.ac.jp
Phone number of contact author: +81-45-924-5544