ICML-98 Submission #192
An Analysis of Direct Reinforcement Learning in Non-Markovian Domains
Authors: Mark D. Pendrith
Daimler-Benz Research and Technology
1510 Page Mill Rd
Palo Alto, CA 94304
and
Michael J. McGarity
School of Electrical Engineering
The University of New South Wales
Sydney 2052 Australia
Abstract
It well-known that for Markov Decision Processes, the policies stable
under policy-iteration and the standard reinforcement learning methods
are exactly the optimal policies. In this paper, we investigate the
conditions for policy stability in the more general situation when the
Markov property cannot be assumed. We show that for a general class of
non-Markov decision processes, if actual return (Monte Carlo) credit
assignment is used with undiscounted returns, we are still guaranteed
the optimal observation-based policies will be equilibrium points in
the policy space when using the standard ``direct'' reinforcement
learning approaches. However, if either discounted rewards, or a
temporal differences style of credit assignment method is used, this
is not the case.
Keywords: Reinforcement learning, non-Markov Decision Processes,
theoretical analysis
Contact: pendrith@rtna.daimlerbenz.com, (650) 845-2534