Mending the Application-Network Gap in Big Data Analytics

Tuesday, March 17, 2015 -
4:00pm to 5:00pm
1240 Computer Sciences

Speaker Name: 

Mosharaf Chowdhury

Speaker Institution: 

UC Berkeley

Cookies: 

Yes

Cookies Location: 

1240 Computer Sciences

Description: 

With the rapid rise of cloud computing, scale-out applications running on large clusters are becoming the norm. While the diversity of applications and the capacity of datacenters are continuously growing, application- and network-level goals are moving further apart. For example, the duration of a shuffle—the communication stage of a MapReduce application—is determined by the completion time of its last flow. This means that one can improve the shuffle completion time by slowing down the smaller flows and allocating the extra bandwidth to speed up the larger flows. However, today’s application-agnostic networks treat each flow independently, resulting in suboptimal application-level performance.

In this talk, I will present the coflow abstraction that bridges this gap by exposing the performance goals of data-parallel applications to the network. For example, a coflow can capture the semantics of a shuffle. By leveraging application-level semantics, coflows allow us to improve the communication performance of individual applications, across multiple applications, and in the presence of dynamic events like task failures and speculative executions. I will also describe the design decisions behind Varys, a system that enables applications to take advantage of coflow scheduling without any changes to user jobs or the network. By consolidating communication optimizations, Varys allows faster development and relieves users from parameter tuning. Deployments on Amazon EC2 and simulations using Facebook production traces show that Varys improves application-level communication performance by 2X to 6X over traditional, application-agnostic techniques.

Bio: Mosharaf Chowdhury is a doctoral candidate in the AMPLab at UC Berkeley, working with Ion Stoica on topics in networked systems, datacenter networking, and cloud computing. He is also a committer on Apache Spark. Mosharaf holds a master’s degree from the University of Waterloo in Canada and a bachelor’s degree from Bangladesh University of Engineering and Technology. http://mosharaf.com