ICML-98 Submission #41

Title: Using A Permutation Test for Attribute Selection in Decision Trees

Eibe Frank and Ian H. Witten
Department of Computer Science
University of Waikato
Hamilton, New Zealand
{eibe,ihw}@cs.waikato.ac.nz

Abstract

Most techniques for attribute selection in decision trees are biased
towards attributes with many values, and several {\em ad hoc} solutions to
this problem have appeared in the machine learning literature.  Statistical
tests for the existence of an association with a prespecified significance
level provide a well-founded basis for addressing the problem.  However,
many statistical tests are computed from a chi-squared distribution, which
is only a valid approximation to the actual distribution in the
large-sample case---and this patently does not hold near the leaves of a
decision tree.  An exception is the class of permutation tests.  We
describe how permutation tests can be applied to this problem.  We choose
one such test for further exploration, and give a novel two-stage method
for applying it to select attributes in a decision tree.  Results on
practical datasets compare favorably with other methods that also adopt a
pre-pruning strategy.

Keywords: Permutation tests, attribute selection, pre-pruning.

Email address of contact author: eibe@cs.waikato.ac.nz

Phone number of contact author: 0064 856 2889