Faculty Candidate Talk: New Algorithms for High-Dimensional Data

Thursday, February 23, 2017 -
4:00pm to 5:00pm
CS 1240

Speaker Name: 

Ilya Razenshteyn

Speaker Institution: 




Cookies Location: 

CS 1240


ABSTRACT: A popular approach in data analysis is to represent a dataset in a high-dimensional feature space, and reduce a given task to a geometric computational problem. However, most of the classic geometric algorithms scale poorly as the dimension grows and are typically not applicable to the high-dimensional regime. This necessitates the development of new algorithmic approaches that overcome this "curse of dimensionality". In this talk I will give an overview of my work in this area.

* I will describe new algorithms for the high-dimensional Nearest Neighbor Search (NNS) problem, where the goal is to preprocess a dataset so that, given a query object, one can quickly retrieve one or several closest objects from the dataset. Our algorithms improve, for the first time, upon the popular Locality-Sensitive Hashing (LSH) approach.

* Next, I will show how to make several algorithmic ideas underlying the above theoretical results practical. This yields an implementation which is competitive with the state of the art heuristics for the NNS problem. The implementation is a part of FALCONN: a new open-source library for similarity search.

* Finally, I will talk about limits of distance-preserving sketching, i.e., compression methods that approximately preserve the distances between high-dimensional vectors. In particular, we will show that for the well-studied Earth Mover Distance (EMD) such efficient compression is impossible.

The common theme that unifies these and other algorithms for high-dimensional data is the use of "efficient data representations": randomized hashing, sketching, dimensionality reduction, metric embeddings, and others. One goal of the talk is to describe a holistic view of all of these techniques.

BIO: Ilya Razenshteyn is a graduate student at the Theory of Computation group of MIT CSAIL advised by Piotr Indyk. He is broadly interested in the theory of algorithms for massive data with a bias towards algorithms which have the potential of being useful for applications. More specific interests include: similarity search, sketching, metric embeddings, high-dimensional geometry, streaming algorithms, and compressed sensing. Ilya graduated with M.Sc. in Mathematics from Moscow State University back in 2012. His awards include Akamai Presidential Fellowship and Simons Foundation Junior Fellowship.