Title:
KnightCap: A chess program that learns by combining TD(lambda) with game-tree search
Authors:
Jonathan Baxter
Department of Systems Engineering
Australian National University
Canberra 0200, Australia
Andrew Tridgell
Department of Computer Science
Australian National University
Canberra 0200, Australia
Lex Weaver
Department of Computer Science
Australian National University
Canberra 0200, Australia
Abstract:
In this paper we present TDLeaf($\lambda$), a variation on the
TD($\lambda$) algorithm that enables it to be used in conjunction with
game-tree search. We present some experiments in which our chess
program ``KnightCap'' used TDLeaf($\lambda$) to learn its evaluation
function while playing on the Free Internet Chess Server (FICS, {\tt
fics.onenet.net}). The main success we report is that KnightCap
improved from a 1650 rating to a 2150 rating in just 308 games and 3
days of play. As a reference, a rating of 1650 corresponds to about
level B human play (on a scale from E (1000) to A (1800)), while 2150
is human master level. We discuss some of the reasons for this success,
principle among them being the use of on-line, rather than self-play.
Keywords: Reinforcement Learning, Temporal Difference Learning, Chess
E-mail Contact Author: jon@syseng.anu.edu.au
Phone Contact Author: +61 262798678