ICML-98 Submission #76

On Feature Selection: Learning with 
    Exponentially many Irrelevant Features as Training Examples

Authors with addresses: Andrew Y. Ng
                        MIT AI Lab
                        550 Memorial Drv #18E1
                        Cambridge MA 02139

Abstract (250 word maximum):

We consider feature selection in the ``wrapper'' model of feature
selection. This typically involves an NP-hard optimization problem
that is approximated by heuristic search for a ``good'' feature
subset. First considering the idealization where this optimization is
performed exactly, we give a rigorous bound for generalization error
under feature selection. The search heuristics typically used are then
immediately seen as trying to achieve the error given in our bounds,
and succeeding to the extent that they succeed in solving the
optimization. The bound suggests that, in the presence of many
``irrelevant'' features, the main source of error in wrapper model
feature selection is from ``overfitting'' hold-out or cross-validation
data. This motivates a new algorithm that, again under the
idealization of performing search exactly, has sample complexity (and
error) that grows {\em logarithmically} in the number of
``irrelevant'' features -- which means it can tolerate having a number
of ``irrelevant'' features {\em exponential} in the number of training
examples -- and search heuristics are again seen to be directly trying
to reach this bound. Experimental results on a problem using simulated
data show the new algorithm having much higher tolerance to irrelevant
features than the standard wrapper model. Lastly, we also discuss
ramifications that sample complexity logarithmic in the number of
irrelevant features might have for feature design in actual
applications of learning.

Keywords: Feature selection, wrapper model, classification.

Email address of contact author: ayn@ai.mit.edu

Phone number of contact author: (617)253-8793