ICML-98 Submission #132

A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization

Authors: Michael Kearns
	 AT&T Labs
	 180 Park Avenue, Room A230
	 Florham Park, NJ 07932
	 mkearns@research.att.com

	 Yishay Mansour
	 Department of Computer Science
	 Tel Aviv University
	 Tel Aviv, Israel
	 mansour@math.tau.ac.il
    	 
Abstract

In this work, we present a new bottom-up algorithm for decision
tree pruning that is very efficient (requiring only a single pass
through the given tree), and prove a strong performance guarantee
for the generalization error of the resulting pruned tree. We work
in the typical setting in which the given tree $T$ may have been
derived from the given training sample $S$, and thus may badly overfit
$S$. In this setting, we give bounds on the amount of additional
generalization error that our pruning suffers compared to the
{\em optimal\/} pruning of $T$. More generally, our results show
that if there is a pruning of $T$ with small error, and whose size
is small compared to $|S|$, then our algorithm will find a pruning
whose error is not much larger. This style of result has been called
an {\em index of resolvability\/} result by Barron and Cover in the
context of density estimation.

A novel feature of our algorithm is its {\em locality\/} --- the
decision to prune a subtree is based entirely on properties of that
subtree and the sample reaching it. To analyze our algorithm, we
develop tools of {\em local uniform convergence\/}, a generalization
of the standard notion that may prove useful in other settings.


Keywords: Decision Trees, Pruning, Theoretical Analysis, Model
	Selection, Uniform Convergence

Contact Author's E-mail: mkearns@research.att.com

Contact Author's phone:  (973) 360-8322