ICML-98 Submission #151

Multi-Criteria Reinforcement Learning

Csaba Szepesvari
Zsolt Kalmar
Zoltan Gabor

all authors are currently with Associative Computing, Ltd.,
Budapest 1121, Konkoly Th. M. u. 29-33.
HUNGARY

Abstract

We consider multi-criteria sequential decision making problems where
the criteria are ordered according to their importance. Conditions for
the optimality of stationary policies and the Bellman optimality
equation are derived.  The analysis requires special care as the
topology introduced by pointwise convergence and the order-topology
introduced by the preference order are in general
incompatible. Reinforcement learning algorithms are then proposed and
analyzed. Preliminary computer experiments with detailed analysis
confirm the validity of the derived algorithms. It is observed that in
the medium-term multi-criteria RL often converges to better solutions
(measured by the first criterion) than their single-criterion
counterparts.  These type of multi-criteria problems are most useful
when there are several optimal solutions to a problem and one wants to
choose one among these in a way that is optimal according to another
fixed criterion. Example applications include alternating games, when
in addition to maximizing the probability of win, the decision maker
also minimizes the expected number of steps to win in states when it
is possible to win, while in states when it is not possible to win it
tries to play for time. Another application comes from specifying
desired behaviors for robots in terms of a list of ordered sub-goals,
e.g., a football playing robot's primary goal could be to shoot goals,
while it's subordinate goal could be to keep clear of opponents as
much as possible.

Keywords:
reinforcement learning, multi-criteria problems, Markovian
Decision Problems, alternating games, ordered subgoal
decomposition

Email address of contact author:
szepes@mindmaker.kfkipark.hu

Phone number of contact author:
+36 1 395 9220/1205 (dial extension continuously)