Title: The Journey From Faster to More Predictable: Using Statistics to Build Better Data-Intensive Systems
Abstract: Much of the database research over the past four decades has focused on improving the overall performance of data-intensive applications, in terms of throughput and mean latency. This race for “faster” has neglected an important aspect of performance: predictability. Modern data management systems have turned into massive codebases of sophisticated algorithms and data structures, rendering them highly unpredictable in their performance. This has not only made their provisioning, tuning, and diagnosis increasingly challenging, but has also raised their total cost of ownership. For example, to cope with tail latencies, highly-qualified personnel are constantly occupied with performance tuning of their database applications, overprovisioning their hardware resources, and even resorting to expensive redundancies in data and computation.
In this discussion, we begin our journey by vetting various notions of predictability that affect data-intensive applications, including variance and high-percentile latencies, graceful degradation, regression error, and user-specified deadlines. We then identify the internal sources of performance unpredictability. In particular, we propose a new type of performance profiler, called VProfiler. Unlike traditional profilers that focus on mean latency, VProfiler can pin point the root causes of performance variance in a complex codebase. Our findings lead us to develop a new scheduling algorithm, called VATS, which has now become the default policy in MySQL distributions. We then turn our attention to external sources of unpredictability, where the theory of Robust Optimization presents itself as a viable approach. Surprisingly, our journey from faster to more predictable comes full circle, as we discover that speed and predictability are not mutually exclusive; predictable performance is often faster too! We conclude by introducing a practical tool that uses past statistics to extract “workload intelligence” and predict future performance.
Bio: Barzan Mozafari is a Morris Wellman Assistant Professor of Computer Science and Engineering at the University of Michigan, Ann Arbor, where he leads a research group designing the next generation of scalable databases using advanced statistical models. Prior to that, he was a Postdoctoral Associate at MIT. He earned his Ph.D. in Computer Science from UCLA in 2011. His research career has led to several open-source projects, including DBSeer (an automated database diagnosis tool), BlinkDB (a massively parallel approximate query engine), and SnappyData (an HTAP engine that empowers Apache Spark with transactions and real-time analytics). He has won the National Science Foundation CAREER award, as well as several best paper awards in ACM SIGMOD and EuroSys. He is the founder of Michigan Software Experts, and a strategic advisor to SnappyData, a company that commercializes the ideas introduced by BlinkDB.