ICML-98 Submission #132
A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization
Authors: Michael Kearns
AT&T Labs
180 Park Avenue, Room A230
Florham Park, NJ 07932
mkearns@research.att.com
Yishay Mansour
Department of Computer Science
Tel Aviv University
Tel Aviv, Israel
mansour@math.tau.ac.il
Abstract
In this work, we present a new bottom-up algorithm for decision
tree pruning that is very efficient (requiring only a single pass
through the given tree), and prove a strong performance guarantee
for the generalization error of the resulting pruned tree. We work
in the typical setting in which the given tree $T$ may have been
derived from the given training sample $S$, and thus may badly overfit
$S$. In this setting, we give bounds on the amount of additional
generalization error that our pruning suffers compared to the
{\em optimal\/} pruning of $T$. More generally, our results show
that if there is a pruning of $T$ with small error, and whose size
is small compared to $|S|$, then our algorithm will find a pruning
whose error is not much larger. This style of result has been called
an {\em index of resolvability\/} result by Barron and Cover in the
context of density estimation.
A novel feature of our algorithm is its {\em locality\/} --- the
decision to prune a subtree is based entirely on properties of that
subtree and the sample reaching it. To analyze our algorithm, we
develop tools of {\em local uniform convergence\/}, a generalization
of the standard notion that may prove useful in other settings.
Keywords: Decision Trees, Pruning, Theoretical Analysis, Model
Selection, Uniform Convergence
Contact Author's E-mail: mkearns@research.att.com
Contact Author's phone: (973) 360-8322