Category Archives: Recent News

Aalo Accepted to Appear at SIGCOMM’2015

Update: Camera-ready version is available here now!

Last SIGCOMM we introduced the coflow scheduling problem and presented Varys that addressed its clairvoyant variation, i.e., when all the information of individual coflows are known a priori, and there is no cluster and task scheduling dynamics. In many cases, these assumptions do not hold very well and left us with two primary challenges. First, how to schedule coflows when individual flow sizes, the total number of flows in a coflow , or their end points are unknown. Second, how to schedule coflows for jobs with more than one stages forming DAGs and where tasks can be scheduled in multiple waves. Aalo addresses both these problems by providing the first solution for the non-clairvoyant coflow scheduling problem: it can efficiently schedule coflows without any prior knowledge of coflow characteristics and can still approximate Varys’s performance very closely. This makes coflows practical in almost all scenarios we face in data-parallel communication.

Leveraging application-level requirements exposed through coflows has recently been shown to improve application-level communication performance in data-parallel clusters. However, existing efficient schedulers require a priori coflow information (e.g., flow size) and ignore cluster dynamics like pipelining, task failures, and speculative executions, which limit their applicability. Schedulers without prior knowledge compromise on performance to avoid head-of-line blocking. In this paper, we present Aalo that strikes a balance and efficiently schedules coflows without prior knowledge.

Aalo employs Discretized Coflow-Aware Least-Attained Service (D-CLAS) to separate coflows into a small number of priority queues based on how much they have already sent across the cluster. By performing prioritization across queues and by scheduling coflows in the FIFO order within each queue, Aalo’s non-clairvoyant scheduler can schedule diverse coflows and minimize their completion times. EC2 deployments and trace-driven simulations show that com- munication stages complete 1.93X faster on average and 3.59X faster at the 95th percentile using Aalo in comparison to per-flow mechanisms. Aalo’s performance is comparable to that of solutions using prior knowledge, and Aalo outperforms them in presence of cluster dynamics.

This is a joint work with Ion, the first time we have written a conference paper just between the two of us!

This year the SIGCOMM PC accepted 40 papers out of 262 submissions with a 15.27% acceptance rate. This also happens to be my last SIGCOMM as a student. Glad to be lucky enough to end on a high note!

Orchestra is the Default Broadcast Mechanism in Apache Spark

With its recent release, Apache Spark has promoted Cornet—the BitTorrent-like broadcast mechanism proposed in Orchestra (SIGCOMM’11)—to become its default broadcast mechanism. It’s great to see our research see the light of the real-world! Many thanks to Reynold and others for making it happen.

MLlib, the machine learning library of Spark, will enjoy the biggest boost from this change because of the broadcast-heavy nature of many machine learning algorithms.

Presented Varys at SIGCOMM’2014

Update: Slides from my talk are online now!

Just got back from a vibrant SIGCOMM! I presented our recent work on application-aware network scheduling using the coflow abstraction (Varys). This is my fifth time attending and third time giving a talk. Great pleasure as always in meeting old friends and making new ones!

SIGCOMM was held at Chicago this year, and about 715 people attended.

Varys Developer Alpha Released!

We are glad to announce the first open-source release of Varys, an application-aware network scheduler for data-parallel clusters using the coflow abstraction. It’s a stripped-down dev-alpha release for the experts, so please be patient with it!

A quick overview of the system can be found at varys.net. Here is a 30-second summary:

Varys is an open source network manager/scheduler that aims to improve communication performance of Big Data applications. Its target applications/jobs include those written in Spark, Hadoop, YARN, BSP, and similar data-parallel frameworks.

Varys provides a simple API that allows data-parallel frameworks to express their communication requirements as coflows with minimal changes to the framework. Using coflows as the basic abstraction of network scheduling, Varys implements novel schedulers either to make applications faster or to make time-restricted applications complete within deadlines.

Primary features included in this initial release are:

  • Support for in-memory and on-disk coflows,
  • Efficient scheduling to minimize the average coflow completion times, and
  • In the deadline-sensitive mode, support for soft deadlines.

Here are some links, if you want to check it out, contribute to make it better, or just want to point someone else who can help us.

Project Website: http://varys.net
Git repository: https://github.com/coflow/varys
Relevant tools: https://github.com/coflow
Research papers
1. Efficient Coflow Scheduling with Varys
2. Coflow: A Networking Abstraction for Cluster Applications

The project is still in its early stage and can use all the help to become successful. We appreciate your feedback.

Sinbad Merged with HDFS Trunk at Facebook!

Today marks the end of my stay at Facebook. Over the last eight months, on and off, I have re-implemented, tested, evaluated, reproduced the results, and merged Sinbad (SIGCOMM’13) with Facebook’s internal HDFS trunk. Several of the future work promised in the paper have been executed in the process, with varying results. It has been quite a fruitful stint for me; hopefully, it’ll be useful for Facebook in coming days as it rolls out to more production clusters. After all, who doesn’t like faster writes!

I’m really glad to have had this opportunity, and there are many people to thank. It all started with giving a talk last July at the end of my Facebook Fellowship, which resulted in Hairong asking me to implement Sinbad at Facebook. Every single member of the HDFS team have been extremely helpful since I started last October. I’d specially like to mention Hairong, Pritam, Dikang, Weiyan, Justin, Peter, Tom, and Eddie for spending many hours in getting me up to the speed, code reviewing, bug fixing, helping in deploying, troubleshooting, and generally baring with me. I’d also like to thank Lisa from the Networking team to help me add APIs to their infrastructure so that Sinbad can collect rack-level information.

The whole experience was different than my previous internships at research labs for several reasons. First, because it wasn’t an internship, I was on my own schedule and could get my research done while doing this on the side. Second, it was nice to sit with the actual team running some of the largest HDFS clusters on earth. I did learn a lot and found some new problems to explore in the future. Finally, of course, lots of free delicious food!

I’m glad I listened to Ion when I was asked by Hairong. Now, let’s hope Apache Hadoop/YARN picks it up as well.

Varys accepted at SIGCOMM’2014: Coflow is coming!

Update 2: Varys Developer Alpha released!

Update 1: Latest version will be is available here soon now!

We introduced the coflow abstraction back in 2012 at HotNets-XI, and now we have something solid to show what coflows are good for. Our paper on efficient and deadline-sensitive coflow scheduling has been accepted to appear at this year’s SIGCOMM. By exploiting a little bit of application-level information, coflows enable significantly better performance for distributed data-intensive jobs. In the process, we have discovered, characterized, and proposed heuristics for a novel scheduling problem: “Concurrent Open Shop Scheduling with Coupled Resources.” Yeah, it’s a mouthful; but hey, how often does one stumble upon a new scheduling problem?

Communication in data-parallel applications often involves a collection of parallel flows. Traditional techniques to optimize flow-level metrics do not perform well in optimizing such collections, because the network is largely agnostic to application-level requirements. The recently proposed coflow abstraction bridges this gap and creates new opportunities for network scheduling. In this paper, we address inter-coflow scheduling for two different objectives: decreasing communication time of data-intensive jobs and guaranteeing predictable communication time. We introduce the concurrent open shop scheduling with coupled resources problem, analyze its complexity, and propose effective heuristics to optimize either objective. We present Varys, a system that enables data-intensive frameworks to use coflows and the proposed algorithms while maintaining high network utilization and guaranteeing starvation freedom. EC2 deployments and trace-driven simulations show that communication stages complete up to 3.16X faster on average and up to 2X more coflows meet their deadlines using Varys in comparison to per-flow mechanisms. Moreover, Varys outperforms non-preemptive coflow schedulers by more than 5X.

This is a joint work with Yuan Zhong and Ion Stoica. But as always, it took a village! I’d like to thank Mohammad Alizadeh, Justine Sherry, Peter Bailis, Rachit Agarwal, Ganesh Ananthanarayanan, Tathagata Das, Ali Ghodsi, Gautam Kumar, David Zats, Matei Zaharia, the NSDI and SIGCOMM reviewers, and everyone else who patiently read many drafts or listened to my spiel over the years. It’s been in the oven for many years, and I’m glad that it was so well received.

I believe coflows are the future, and they are coming for the good of the realm network. Varys is only the tip of the iceberg, and there are way too many interesting problems to solve (both for network and theory folks!). We are in the process of open sourcing Varys in coming weeks and look forward to seeing it used with major data-intensive frameworks. Personally, I’m happy how my dissertation research is shaping up :)

This year the SIGCOMM PC accepted 45 papers out of 237 submissions with a 18.99% acceptance rate. This is the highest acceptance rate since 1995  and the highest number of papers accepted ever (tying with SIGCOMM’86)! After years of everyone complaining about the acceptance rate, this year’s PC have taken some action; probably a healthy step for the community.

Sinbad accepted to appear at SIGCOMM’2013

Update: Latest version is available here!

Our paper on leveraging flexibility in endpoint placement, aka Sinbad/Usher/Orchestrated File System (OFS), has been accepted for publication at this year’s SIGCOMM. We introduce constrained anycast in data-intensive clusters in the form of network-aware replica placement for cluster/distributed file systems. This is the second piece of my dissertation research, where the overarching goal is to push slightly more application semantics into the network to get disproportionately larger improvements. Sometimes things are unfair in a good way!

Many applications do not constraint the destinations of their network transfers. New opportunities emerge when such transfers contribute a large amount of network bytes. By choosing the endpoints to avoid congested links, completion times of these transfers as well as that of others without similar flexibility can be improved. In this paper, we focus on leveraging the flexibility in replica placement during writes to cluster file systems (CFSes), which account for almost half of the cross-rack traffic in data-intensive clusters. CFS writes only require placing replicas in multiple fault domains and a balanced use of storage, but the replicas can be in any subset of machines in the cluster.

We study CFS interactions with the cluster network, analyze optimizations for replica placement, and propose Sinbad – a system that identifies imbalance and adapts replica destinations to navigate around congested links. Sinbad does so with little impact on long-term storage load balance. Experiments on EC2 (trace-driven simulations) show that block writes complete 1.3x(1.58x) faster as the network becomes more balanced. As a collateral benefit, the end-to-end completion times of data-intensive jobs improve by up to 1.26x. Trace-driven simulations also show that Sinbad’s improvements (1.58x) is close to that of a loose upper bound (1.89x) of the optimal.

This is a joint work with Srikanth Kandula and Ion Stoica. One cool fact is that we collaborated with Srikanth almost entirely over Skype for almost a year! I did take up a lot of his time from his Summer interns, but most of them had their papers in too. There are a LOT of people to thank including Yuan Zhong, Matei Zaharia, Gautam Kumar, Dave Maltz, Ganesh Ananthanarayanan, Ali Ghodsi, Raj Jain, and everyone else I’m forgetting right now. It took quite some time to put our ideas crisply in few pages, but in the end the reception was great from the PC. We hope it will be useful in practice too.

This year the SIGCOMM PC accepted 38 papers out of 240 submissions with a 15.83% acceptance rate. This is the highest acceptance rate since 1996 (before Ion Stoica had any SIGCOMM paper) and the highest number of papers accepted since 1987 (before Scott Shenker had any paper in SIGCOMM)!

I’m a PhD candidate now!

I’ve finally taken the qualifying exam this morning and presented my proposal in front of professors Sylvia Ratnasamy, Ion Stoica, Scott Shenker, and Marti Hearst. It was good to hear them say I passed :)

My proposal was on using coflows to better manage the network for data-intensive cluster applications: how to manage each one individually, how to manage them together, and how to move everything else out of coflows’ ways.

My talk improved for the better by sitting down with and giving practice talks to Yuan, David, Ali, Ganesh, Panda, Sameer, Anand, and Arka. I’d also thank Matei, Gautam, and TD who helped me even though they were travelling. This is one of the great things about the AMPLab; my lab mates are awesome!

Looking forward to get some work done in the coming months.

Coflow accepted at HotNets’2012

Update: Coflow camera-ready is available online! Tell us what you think!

Our position paper to address the lack of a networking abstraction for cluster applications, “Coflow: A Networking Abstraction for Cluster Applications,” has been accepted at the latest workshop  on hot topics in networking. We make the observation that thinking in terms of flows is good, but we can do better if we think in terms of collections of flows using application semantics while optimizing datacenter-scale applications.

Cluster computing applications — frameworks like MapReduce and user-facing applications like search platforms — have application-level requirements and higher-level abstractions to express them. However, there exists no networking abstraction that can take advantage of the rich semantics readily available from these data parallel applications.

We propose coflow, a networking abstraction to express the communication requirements of prevalent data parallel programming paradigms. Coflows make it easier for the applications to convey their communication semantics to the network, which in turn enables the network to better optimize common communication patterns.

This observation is a culmination of some of the work I’ve been fortunate to be part of in last couple of years (specially Orchestra@SIGCOMM’11 and recently Ahimsa@HotCloud’12 and FairCloud@SIGCOMM’12) as well as work done by others in the area (e.g., D3@SIGCOMM’11). Looking forward to constructive discussion in Redmond next month!

FTR, HotNets PC this year accepted 23 papers out of 120 submissions.