ICML-98 Submission #2

Title: The Problem with Noise and Small Disjuncts

Authors:

    Gary M. Weiss
    AT&T Labs
    480 Red Hill Road, Rm. 2H-096
    Middletown, NJ 07748

    Haym Hirsh
    Department of Computer Science
    Core 317
    Rutgers University
    Piscataway, NJ 08855

Abstract:
    Many systems that learn from examples express the learned concept as a disjunction. Those disjuncts that cover only a few examples are referred to as small disjuncts. The problem with small disjuncts is that they have a much higher error rate than large disjuncts but are necessary to achieve a high level of predictive accuracy. This paper extends previous work by considering the effect of noise on small disjuncts. In particular, in this paper we show that when noise is added to two real-world domains, a significant, and disproportionate number of the total errors are contributed by the small disjuncts. Thus, we show that when noise is added to these domains, it is the small disjuncts that are primarily responsible for the poor predictive accuracy of the learned concept.

Keywords: Decision Trees

Email: gary.m.weiss@att.com
Phone: (732) 615-4698