Noisy and missing data are prevalent in many real-world statistical estimation problems. Popular techniques for handling non-idealities in data, such as imputation and expectation-maximization, are often difficult to analyze theoretically and/or terminate in local optima of non-convex functions -- these problems are only exacerbated in high-dimensional settings. We present new methods for obtaining high-dimensional regression estimators in the presence of corrupted data, and provide theoretical guarantees for the statistical consistency of our methods.
We live in a software-driven world. Software helps us communicate and collaborate; create art and music; and make discoveries in biological, physical, and social sciences. Yet the growing demand for new software, to solve new kinds of problems, remains largely unmet. Because programming is still hard, developer productivity is limited, and so is end-users' ability to program on their own.
GPGPUs are gaining track as a vehicle for power-efficient, high performance computing. Nevertheless, their von-Neumann-based design suggests they are amenable the model’s key inefficiencies: the processor must fetch and decode each dynamic instruction instance, and all intermediate values of the computation must be transferred back-and-forth between the functional units and the register file.
Efficient memory sharing between CPU and GPU threads can greatly expand the effective set of GPGPU workloads. For increased programmability, this memory should be uniformly virtualized, necessitating compatible address translation support for GPU memory references. However, even a modest GPU might need 100s of translations per cycle (6 CUs * 64 lanes/CU) with memory access patterns designed for throughput more than locality.
New computing platforms have greatly increased the demand for programmers, but learning to program remains a big challenge. Program synthesis has the potential to revolutionize programming by making it more accessible. My work has focused on two goals: making programming more intuitive through the use of new interfaces, and using automated feedback to help students learn programming. In this talk, I will present my work on three systems that work towards these goals.
The growth of computer science enrollments, especially in introductory courses, raises the possibility of increasing levels of plagiarism. Fortunately, good software plagiarism detectors have been available since the mid-90's, and these help detect suspicious assignment submissions. However, many popular plagiarism detectors were designed when computer memories were two to three orders of magnitude smaller than today's systems, so these systems necessarily made a number of assumptions to bound their running times and keep their memory consumption low.
This talk gives an introduction to a recently established link between the geometry of numbers and mixed integer linear optimization. The main focus is to provide a review of families of lattice-free polyhedra and their use in the context of deriving and explaining cutting planes for mixed integer programs. This approach is not only mathematically interesting, but it leads to some fundamental new discoveries, such as an understanding under which conditions cutting planes algorithms converge finitely.
Multicore is here to stay. Software developers are moving to concurrent programming. However, this move is slow and challenging due to the exponential complexity in reasoning about concurrency. In particular, Heisenbugs such as data races, which are non-deterministic concurrency errors, pervasively infect concurrent software, making concurrent program debugging notoriously difficult.
Commitee: Stephen Wright (advisor), James Luedtke (advisor), Jeffrey Linderoth (advisor), Thomas Rutherford, Christopher Re (Stanford University)
In this dissertation we study several nonconvex and combinatorial optimization problems with applications in production planning, machine learning, advertising, statistics, and computer vision. The common theme is the use of algorithmic and modelling techniques from mixed-integer program- ming (MIP) which include formulation strengthening, decomposition, and linear programming (LP) rounding.