ICML-98 Submission #116
A Learning Rate Analysis of Reinforcement Learning Algorithms in
Finite-Horizon
Frédérick Garcia and Seydina M. Ndiaye
INRA/BIA, Auzeville
BP 27, 31326 Castanet Tolosan cedex
France
Abstract
Reinforcement learning algorithms are adaptative methods for solving
Markovian decision problems when no model is available. In
infinite-horizon, different reinforcement learning algorithms like
Q-Learning or R-Learning have been developed. In this article we
consider the particular framework of non-stationary finite-horizon
Markov Decision Processes. We first prove that the finite-horizon
total reward criterion and the average-reward criterion are equivalent
in finite-horizon, and we define QH-Learning and RH-Learning for
finite-horizon MDP. Then we introduce the Ordinary Differential
Equation (ODE) method to conduct a learning rate analysis of
QH-Learning and RH-Learning. RH-Learning appears to be a version of
QH-Learning with matrix-valued stepsizes, the corresponding gain
matrix being very close to the optimal matrix which results from the
ODE analysis. Experimental results confirm that performance hierarchy.
Keywords: reinforcement learning, Markov decision processes, finite horizon,
ODE method, learning rate analysis.
Contact author: Frédérick Garcia
email: fgarcia@toulouse.inra.fr
tel: 33 5 61 28 52 83