Update: Camera-ready version of the paper
should be can be found in the publications page very soon!
Our paper “Managing Data Transfers in Computer Clusters with Orchestra” has been accepted at SIGCOMM’2011. This is a joint work with Matei, Justin, and professors Mike Jordan and Ion Stoica. The project started as part of Spark and now quickly expanding to stand on its own to support other data-intensive frameworks (e.g., Hadoop, Dryad etc.). We also believe that interfacing Orchestra with Mesos will enable better network sharing between concurrently running frameworks in data centers.
Cluster computing applications like MapReduce and Dryad transfer massive amounts of data between their computation stages. These transfers can have a significant impact on job performance, accounting for more than 50% of job completion times. Despite this impact, there has been relatively little work on optimizing the performance of these data transfers. In this paper, we propose a global management architecture and a set of algorithms that improve the transfer times of common communication patterns, such as broadcast and shuffle, and allow one to prioritize a transfer over other transfers belonging to the same application or different applications. Using a prototype implementation, we show that our solution improves broadcast completion times by up to 4.5x compared to the status quo implemented by Hadoop. Furthermore, we show that transfer-level scheduling can reduce the completion time of high-priority transfers by 1.7x.
The paper so far have been well-received, and we’ve got great feedback from the anonymous reviewers that will further strengthen it. Hopefully, you will like it too :)
Those who are interested in stats, this year SIGCOMM accepted 32 out of 223 submissions.
Anyway, it’s Friday and we so excited!