ICML-98 Submission #32

TITLE: A Process-Oriented Heuristic for Model Selection

AUTHOR:
Pedro Domingos
Sec. Sistemas, Dept. Eng. Mecanica
Instituto Superior Tecnico
Av. Rovisco Pais
Lisbon 1096, Portugal

ABSTRACT:
Current methods to avoid overfitting are either data-oriented (using separate data for validation) or representation-oriented (penalizing complexity in the model). This paper proposes process-oriented evaluation, where a model's expected generalization error is computed as a function of the search process that led to it. The paper develops the necessary theoretical framework, and applies it to one type of learning: rule induction. A process-oriented version of the CN2 rule learner is empirically compared with the default CN2. The process-oriented version is more accurate in a large majority of the datasets, with high significance, and also produces simpler models. Experiments in artificial domains suggest that process-oriented evaluation is particularly useful in high-dimensional domains. KEYWORDS: Model selection, model evaluation, overfitting avoidance, error estimation, rule induction, classification, probabilistic learning

EMAIL: pedrod@gia.ist.utl.pt
TELEPHONE: +351-1-841-7479/7269