ICML-98 Submission #153

Title:   Using Eligibility Traces to Find the Best Memoryless Policy in
         Partially Observable Markov Decision Processes

Authors: John Loch and Satinder Singh
Address: Department of Computer Science
	 University of Colorado
	 Boulder, CO 80309-0430
	 
Abstract:	

	Recent research on hidden-state RL problems has concentrated
on overcoming partial observability by using memory to estimate
state. However, such methods are computationally extremely expensive
and thus have very limited applicability. This emphasis on state
estimation has come about because it has been widely observed that the
presence of hidden state or partial observability renders popular RL
methods such as Q-learning and Sarsa useless. However, this
observation is misleading in two ways: first, the theoretical results
supporting it only apply to RL algorithms that do not use eligibility
traces, and second that these results are worst-case results which
leaves open the possibility that there may be large classes of
hidden-state problems in which RL algorithms work well without any
state estimation.

	In this paper we show empirically that Sarsa(lambda), a well
known family of RL algorithms that use eligibility traces, can work
very well on hidden state problems that have good memoryless policies,
i.e., on RL problems in which there may well be very poor
observability but there also exists a mapping from immediate
observations to actions that yields near-optimal return. We apply
conventional Sarsa(lambda) to four test problems taken from the recent
work of Littman, Parr and Russell, and Chrisman, and in each case show
that it is able to find the best memoryless policy without any of the
computational expense of state estimation.


Keywords: Reinforcement learning, POMDP, hidden state, 
	  eligbility traces.

E-mail address of contact author: baveja@cs.colorado.edu

Phone number of contact author: (303) 492-5172