ICML-98 Submission #2
Title: The Problem with Noise and Small Disjuncts
Authors:
Gary M. Weiss
AT&T Labs
480 Red Hill Road, Rm. 2H-096
Middletown, NJ 07748
Haym Hirsh
Department of Computer Science
Core 317
Rutgers University
Piscataway, NJ 08855
Abstract:
Many systems that learn from examples express the learned
concept as a disjunction. Those disjuncts that cover only a few
examples are referred to as small disjuncts. The problem with
small disjuncts is that they have a much higher error rate than
large disjuncts but are necessary to achieve a high level of
predictive accuracy. This paper extends previous work by
considering the effect of noise on small disjuncts. In particular,
in this paper we show that when noise is added to two real-world
domains, a significant, and disproportionate number of the total
errors are contributed by the small disjuncts. Thus, we show that
when noise is added to these domains, it is the small disjuncts
that are primarily responsible for the poor predictive accuracy of
the learned concept.
Keywords: Decision Trees
Email: gary.m.weiss@att.com
Phone: (732) 615-4698