ICML-98 Submission #160

Bayesian Network Classification with Continuous Features:
Getting the Best of Both Discretization and Parametric Fitting  

  Nir Friedman
  Computer Science Division, 387 Soda Hall
  University of California, Berkeley CA 94720
  nir@cs.berkeley.edu

  Moises Goldszmidt
  SRI International,
  333 Ravenswood Ave., Menlo Park, CA 94025
  moises@erg.sri.com

  Thomas J. Lee
  SRI International,
  333 Ravenswood Ave., Menlo Park, CA 94025
  tomlee@erg.sri.com


Abstract (250 word maximum):


In a recent paper, Friedman, Geiger, and Goldszmidt introduced a
classifier based on Bayesian networks called Tree Augmented Naive
Bayes (TAN) that outperforms naive Bayes and performs competitively
with C4.5 and other state of the art methods.  This classifier has
several advantages including, robustness, and polynomial computational
complexity.  One limitation of the TAN classifier is that it applies
only to discrete attributes. Thus, features must be prediscretized
before applying this classifier.  In this work we extend TAN to deal
with continuous attributes directly using parametric (e.g., Gaussians)
and semi-parametric (e.g., mixture of Gaussians) conditional
probabilities.  The result is a classifier that can represent and
combine both discrete and continuous features.  In addition, we
propose a new method, that takes advantage of the modeling language of
Bayesian networks, for representing features both in discrete and
continuous form simultaneously, and using both versions in the
classification. This automates the decision as to which form of the
feature is most relevant to the classification task. It also avoids
the commitment to either discretized form or (semi)parametric form, as
different features may correlate better with one version or the other.
As our empirical results show, this latter method usually achieves
classification performance that is as good or better than both the
purely discrete and purely continuous TAN models.  We also discuss the
implications of this method in density estimation tasks.

Keywords: Classification, Continuous Attributes, Discretization,
          Bayesian Networks

Email address of contact author: nir@cs.berkeley.edu
Phone number of contact author: 510-643-2779

Multiple submission statement (if applicable):

 This abstract was submitted to the "Machines that learn" (Snowbird)
  workshop which does not have proceedings and/or publication
  records. It has not been submitted to any other conference or journal.