Arch Seminar: Killer Microseconds and the Tail at Scale

Tuesday, August 18, 2015 - 4:00pm
CS 1240

Speaker Name: 

Thomas Wenisch

Speaker Institution: 

University of Michigan



Cookies Location: 

CS 1240


Online Data Intensive (OLDI) applications, which process terabytes of data with sub-second latencies, are the cornerstone of modern internet services like web search and social media. In this talk, I discuss two system design challenges that make it very difficult to build efficient OLDI applications and speculate on possible solution directions. (1) Killer Microseconds---today's CPUs are highly effective at hiding the nanosecond-scale latency of memory accesses and operating systems are highly effective at hiding the millisecond-scale latency of disks. However, modern high-performance networking and flash I/O frequently lead to situations where data are a few microseconds away. Neither hardware nor software offer effective mechanisms to hide these microsecond-scale stalls. (2) The Tail at Scale---OLDI services typically rely on a strategy of sharding their data sets over hundreds or even thousands of servers to meet latency objectives. However, this strategy mandates that fully processing a request requires waiting for the slowest straggler among these servers. As a result, exceedingly rare events, such as transient network congestion, interrupts, OS background activity, or CPU power state changes, which have negligible impact on the throughput of a single sever nevertheless come to dominate the latency distribution of the OLDI service. At 1000-node scale, the 5th '9 of the individual server's latency distribution becomes the 99% latency tail of the entire request. These two challenges cause OLDI operators to execute their workloads inefficiently at low utilization to avoid compounding stalls and tails with queueing delays. There is a pressing need for architects to find ways to hide microsecond-scale stalls and track down and address the rare triggers of 99.999% tail performance anomalies that destroy application-level latency objectives.

Special acknowledgements to Luiz Barroso, Partha Ranganathan, and Mike Marty for many of the ideas in this talk.


Thomas Wenisch is an Associate Professor of Computer Science and Engineering at the University of Michigan, specializing in computer architecture. His prior research includes memory streaming for commercial server applications, multiprocessor memory systems, memory disaggregation, and rigorous sampling-based performance evaluation methodologies. His ongoing work focuses on computational sprinting, server and data center architectures, programming models for byte-addressable NVRAM, and architectures to enable hand-held 3D ultrasound. Wenisch received the NSF CAREER award in 2009. Prior to his academic career, Wenisch was a software developer at American Power Conversion, where he worked on data center thermal topology estimation. He received his Ph.D. in Electrical and Computer Engineering from Carnegie Mellon University.