ICML-98 Submission #157
Learning Sorting and Classification with POMDPs
Blai Bonet and Hector Geffner
Departamento de Computacion
Universidad Simon Bolivar
Aptdo. 89000, Caracas 1080-A, Venezuela
Abstract
POMDPs are general models of sequential decisions in which both
actions and observations can be probabilistic. Many problems of
interest, including learning decision trees from data, can be
formulated as POMDPs, yet the use of POMDPs have been limited by the
lack of effective algorithms. Recently this has began to change and a
number of problems such robot navigation and planning problems are
beginning to be formulated and solved as POMDPs. The advantage of the
POMDP approach is its clean semantics and the possibility of
principled solutions that naturally integrate physical and information
gathering actions. In this paper we pursue this approach in the
context of two learning tasks: learning to sort a vector of numbers
and learning decision trees from data. Both problems are formulated as
POMDPS and solved by a general POMDP algorithm based on the ideas of
Real Time Dynamic Programming. The main lessons and results obtained
are the following:
1. the use of suitable heuristics and
representations allows us to deal
with sorting and classification problems of realistic size
2. the quality of the results appear to be competitive with the
best handcrafted algorithms tailored for each of the two tasks
3. the application of these methods to very large problems
require generalization methods that are different from the ones
used in model-free RTDP algorithms such as Q-learning
Keywords: POMDPs, Real Time Dynamic Programming, Decision Trees
Email address of contact author: hector@usb.ve
Phone number of contact author: (011 582) 906-3257