Kwang-Sung Jun: Learning from Human-Generated Lists
** ICML Practice Talk **
* Abstract: Human-generated lists are a form of non-iid data with important applications in machine learning and cognitive psychology. We propose a generative model -- sampling with reduced replacement (SWIRL) -- for such
lists. We discuss SWIRL's relation to standard sampling paradigms, provide the maximum likelihood estimate for learning, and demonstrate its value with two real-world applications: (i) In a "feature volunteering" task where non-experts spontaneously generate feature=>label pairs for text classication, SWIRL improves the accuracy of state-of-the-art feature-learning frameworks. (ii) In a "verbal fluency" task where brain-damaged patients generate word lists when prompted with a category, SWIRL parameters align well with existing psychological theories, and
our model can classify healthy people vs. patients from the lists they generate.
