Tag Archives: Datacenter Networking

FairCloud has been accepted at SIGCOMM’2012

Update: Camera-ready is now available online!

This is kinda old news now, but still as exciting as it was few days ago. Our paper “FairCloud: Sharing the Network in Cloud Computing” has been accepted for publication at this year’s SIGCOMM. We explore the design space of sharing networks, identify tradeoffs, and place categorize different strategies based on their characteristics. In case you are following our Orchestra work, FairCloud sits in the Inter-Transfer Controller (ITC) of the Orchestra hierarchy.

The network, similar to CPU and memory, is a critical and shared resource in the cloud. However, unlike other resources, it is neither shared proportionally to payment, nor do cloud providers offer minimum guarantees on network bandwidth. The reason is that networks are more difficult to share, since the network allocation of a VM X depends not only on the VMs running on the same machine with X, but also on the other VMs that X communicates with, as well as on the cross-traffic on each link used by X. In this paper, we start from the above requirements—payment proportionality and minimum guarantees—and show that the network- specific challenges lead to fundamental tradeoffs when sharing datacenter networks. We then propose a set of properties to explicitly express these tradeoffs. Finally, we propose three allocation policies that allow us to navigate the tradeoff space. We evaluate their characteristics through simulation and testbed experiments, showing that they are able to provide minimum guarantees and achieve better proportionality with the per-VM payment than known allocations.

This year 32 out of 235 papers have been accepted at SIGCOMM. On other news, Berkeley has seven papers in this SIGCOMM!!!

Presented Orchestra at SIGCOMM’2011

I’m attending my second SIGCOMM and had the privilege of giving my first talk at the flagship networking conference. I presented Orchestra, which happened to be very well attended even though it was the last talk of the day at 6PM. I’d like to thank everyone for showing up and also for the lively Q/A session at the end of my talk. Now that the talk is over, I can enjoy the rest of the conference in a more relaxed fashion.

The slides for the talk are available here.

Presented Orchestra at LBNL

Today I presented Orchestra for the first time in front of a crowd outside our lab. Taghrid Samak kindly invited me at LBNL’s Computing Sciences Seminar after we caught up over lunch last week, after a year. She is currently a post-doc fellow with the Advance Computing for Science group.

Overall, the talk went very well with some interesting questions. We might even get into future extension/collaboration work regarding some pieces of Orchestra. Hot stuff!

Orchestra has been accepted at SIGCOMM’2011

Update: Camera-ready version of the paper should be can be found in the publications page very soon!

Our paper “Managing Data Transfers in Computer Clusters with Orchestra” has been accepted at SIGCOMM’2011. This is a joint work with Matei, Justin, and professors Mike Jordan and Ion Stoica. The project started as part of Spark and now quickly expanding to stand on its own to support other data-intensive frameworks (e.g., Hadoop, Dryad etc.). We also believe that interfacing Orchestra with Mesos will enable better network sharing between concurrently running frameworks in data centers.

Cluster computing applications like MapReduce and Dryad transfer massive amounts of data between their computation stages. These transfers can have a significant impact on job performance, accounting for more than 50% of job completion times. Despite this impact, there has been relatively little work on optimizing the performance of these data transfers. In this paper, we propose a global management architecture and a set of algorithms that improve the transfer times of common communication patterns, such as broadcast and shuffle, and allow one to prioritize a transfer over other transfers belonging to the same application or different applications. Using a prototype implementation, we show that our solution improves broadcast completion times by up to 4.5x compared to the status quo implemented by Hadoop. Furthermore, we show that transfer-level scheduling can reduce the completion time of high-priority transfers by 1.7x.

The paper so far have been well-received, and we’ve got great feedback from the anonymous reviewers that will further strengthen it. Hopefully, you will like it too :)

Those who are interested in stats, this year SIGCOMM accepted 32 out of 223 submissions.

Anyway, it’s Friday and we so excited!