TITLE: A Process-Oriented Heuristic for Model Selection
AUTHOR:
Pedro Domingos
Sec. Sistemas, Dept. Eng. Mecanica
Instituto Superior Tecnico
Av. Rovisco Pais
Lisbon 1096, Portugal
ABSTRACT:
Current methods to avoid overfitting are either data-oriented (using
separate data for validation) or representation-oriented (penalizing
complexity in the model). This paper proposes process-oriented evaluation,
where a model's expected generalization error is computed as a function of
the search process that led to it. The paper develops the necessary
theoretical framework, and applies it to one type of learning: rule
induction. A process-oriented version of the CN2 rule learner is empirically
compared with the default CN2. The process-oriented version is more accurate
in a large majority of the datasets, with high significance, and also
produces simpler models. Experiments in artificial domains suggest that
process-oriented evaluation is particularly useful in high-dimensional
domains.
KEYWORDS: Model selection, model evaluation, overfitting avoidance, error
estimation, rule induction, classification, probabilistic learning
EMAIL: pedrod@gia.ist.utl.pt
TELEPHONE: +351-1-841-7479/7269