Documentation

UW Connect

Paul Valiant: The Challenge of Data Efficiency: Vignettes from Statistics and Evolution

Room: 
1240
Speaker Name: 
Paul Valiant
Cookies: 
Yes
Cookies Location: 
1240
The Challenge of Data Efficiency: Vignettes from Statistics and Evolution

As the amount of data that can be brought to bear on a task grows, so too
does the difficulty and importance of making the best use of our data. In
this talk we will consider two settings of this problem, one from
statistics and one from the natural world.

A basic question in statistics is to determine how much data is needed to
estimate various properties of one or more probability distributions. We
consider the fundamental properties of support size, entropy, and the
distance between two distributions. We present a unified new approach to
these problems which yields the first explicit sublinear algorithms: given
one or more probability distributions on a domain of size n, we show how to
estimate each of these properties using n/log n samples, where n
corresponds to the support size of the distribution. Our approach leverages
the techniques of linear programming to yield a single algorithm to
estimate all three properties, and, indeed, all properties from a wider
class that contains them. Further, we construct a new discrete version of
the multivariate central limit theorem to show that n/log n samples are in
fact needed to estimate these properties and thus show that our algorithm
uses the optimal number of samples, to within constant factor.

In the second half of the talk, we take our inspiration from biological
evolution -- a natural process that, from vast amounts of data and
complexity yields surprisingly effective results. In this talk we formulate
a notion of "evolvability" for functions with domain and range that are
real-valued vectors, an appropriate way of expressing many natural
biological processes such as protein expression regulation. We show that
linear and fixed-degree polynomial functions are evolvable in the following
dually robust sense: There is a single evolution algorithm that for all
convex loss functions converges for all distributions. We further examine
the scope of the evolvability model and discuss future directions towards a
computational understanding of evolution.

Bio: Paul Valiant attended Stanford from 2000-2004, earning a BS in Mathematics,
a BS in Physics, and an MS in Computer Science. From 2004 – 2008 he studied
at MIT, earning a PhD in Computer Science. After that he has been at UC Berkeley
on an NSF Mathematical Sciences Postdoctoral Research Fellowship.  His awards
include the Machtey Award (Best Student Paper) at FOCS 2005 (co-winner), Best 
Student Paper Award at the Theory of Cryptography Conference 2008, Stanford 
Mathematics Department Research Award for Undergraduate Honors Thesis 
on “General Relativity”. He was a Gold Medalist at the International Mathematical 
Olympiad in 1999. A three-time member of the US International Mathematical 
Olympiad team, he taught last summer at the US Math Olympiad Summer Program
training the US team. His research interests span the areas of Statistics and 
the quest for data efficiency, Security and game theory, Fluid dynaimics-the 
Navier-Stokes problem, and Protein folding and evolution.
Event Date:
Monday, March 19, 2012 - 4:00pm - 5:00pm (ended)