ICML-98 Submission #104
Query Learning Strategies using Boosting and Bagging
Naoki Abe and Hiroshi Mamitsuka
Theory NEC Research Laboratories,
Real World Computing Partnership
c/o NEC C&C Media Research Laboratories
4-1-1 Miyazaki, Miyamae-ku, Kawasaki 216-8555 JAPAN
Abstract
We propose new query learning strategies by combining the idea of
query by committee and that of boosting and bagging. Query by
committee is a query learning strategy which makes use of a randomized
agent learning algorithm and works by querying the function value of a
point at which the predictions made by many copies of the agent
algorithm are maximally spread. The requirement of query by committee
on the agent algorithm that it be an ideal randomized algorithm makes
it hard to apply in practice when we have only a moderately performing
deterministic algorithm. To address this issue, we borrow the ideas
of boosting and bagging, which are both techniques to enhance the
performance of an existing learning algorithm by running it many times
on a set of re-sampled data and combining the output hypotheses to
make a prediction by (weighted) majority voting. We propose two query
learning methods, query by bagging and query by boosting, which select
the next query point by picking a point on which the (weighted)
majority voting by the obtained hypotheses has the least margin. We
empirically evaluate the performance of these methods on a wide range
of real world data. Our experiments show that, when using C4.5 as the
agent learning algorithm and run on data sets in Irvine ML repository,
both query learning methods significantly improve data efficiency as
compared to both C4.5 itself and boosting applied on C4.5. A typical
increase in data efficiency achieved was 2 to 5-fold.
Keywords:
Query learning, Query by committee, boosting, bagging
data efficiency, C4.5., weighted majority algorithm,
Concept learning, Relation learning
Email:
{abe,mami}@ccm.cl.nec.co.jp
Tel:
+81-44-856-2143