Category Archives: Recent News

Carbyne Accepted to Appear at OSDI’2016

Update: Camera-ready version is available here now!

With the wide adoption of distributed data-parallel applications, large-scale resource scheduling has become a constant source of innovation in recent years. There are tens of scheduling solutions that try to optimize for objectives such as user-level fairness, application-level performance, and cluster-level efficiency. However, given the well-known tradeoffs between fairness, performance, and efficiency, these solutions have traditionally focused on one primary objective (e.g., fairness in case of DRF), and they consider other objectives as best effort, secondary goals.

In this paper, we revisit the tradeoff space, demonstrating out that aggressively focusing on optimizing one primary objective while giving up the rest often does not matter much. Because a job cannot complete until all its tasks have completed, each job can altruistically yield some of its resources without hampering its own completion time. These altruistic resources can then be rescheduled among other jobs to significantly improve secondary objectives without hampering the first. Benefits of our approach is visible even for single-stage jobs, and they increase as jobs have more complex DAGs.

Given the well-known tradeoffs between performance, fairness, and efficiency, modern cluster schedulers focus on aggressively optimizing a single objective, while ignoring the rest. However, short-term convergence to a selected objective often does not result in noticeable long-term benefits. Instead, we propose an altruistic, long-term approach, where jobs yield fractions of their allocated resources without impacting their own completion times.

We show that leftover resources collected via altruisms of many jobs can then be rescheduled, essentially introducing a third degree of freedom in cluster scheduling — in addition to inter- and intra-job scheduling. We leverage this new-found flexibility in Carbyne, a scheduler that combines three existing schedulers to simultaneously optimize cluster-wide performance, fairness, and efficiency. Deployments and large-scale simulations on industrial benchmarks and production traces show that Carbyne closely approximates the state-of-the-art solutions for each of the three objectives simultaneously. For example, Carbyne provides 1.26X better efficiency and 1.59X lower average completion time than DRF, while providing similar fairness guarantees.

Altruistic scheduling has many more use cases; e.g., we had a similar observation for coflow scheduling in Varys.

This work started as a collaboration with Robert Grandl and Aditya Akella toward the end of 2015. Ganesh Ananthanarayanan from MSR later joined us to take it to the next level.  After CODA, this is related to another future work (the seventh) from my dissertation; infer whatever you want to out of these two data points ;)

This year the OSDI PC accepted 47 out of 260 papers. This happens to be my first time submitting to OSDI. It’s also my first paper with Ganesh, even though it happened after we both graduated from Berkeley; we sat opposite to each other for four years back then! I also want to thank Aditya for letting me work closely with Robert; it’s been great collaborating with them.

EC-Cache Accepted to Appear at OSDI’2016

Update: Camera-ready version is available here now!

In-memory caching is the de facto solution to enable low latency analytics over large datasets. While caching objects,  one must be careful about maximizing the number of requests that can be served from memory in the presence of popularity skew, background load imbalance, and server failures. Traditional solutions use selective replication, i.e., adjusting the number of replicas based on object popularity, to address these issues. However, dynamically determining replication factors and reacting to hotspots and load imbalance fast enough is often challenging.

We applied erasure coding to address the same set of challenges. Traditionally, erasure coding is widely used in modern clusters to minimize storage usage. However, it is not applied to object caching, nor is it considered a viable solution for load balancing. We show both analytically and empirically that given the instruction sets in modern CPUs, erasure coding can be a viable solution to address the limitations of selective replication as long as we are willing to sacrifice some network bandwidth. Given the rise of high-capacity network topologies and networking technologies, we believe this to be a timely tradeoff.

Data-intensive clusters rely on in-memory object caching to maximize the number of requests that can be served from memory in the presence of popularity skew, background load imbalance, and server failures. In order to improve I/O latency and for load balancing, these caches typically employ selective replication, where the number of cached replicas of an object is proportional to its popularity. Selective replication, however, often falls short in practice, because it needs careful selection of replication factors and dynamic, load-aware replica placement.

EC-Cache is a load-balanced, high-performance cluster cache that uses erasure coding in a novel way to overcome the limitations of selective replication. EC-Cache employs two techniques: (a) splitting and erasure coding individual objects during writes, and (b) late binding, wherein obtaining any k out of (k + r) splits of an object are sufficient, during reads. As compared to selective replication, EC-Cache improves load balancing by a factor of 3.3X and reduces the median and tail read latencies by more than 2X, while using the same amount of memory. EC-Cache does so using 10% additional bandwidth and a small increase in the amount of stored metadata. The benefits offered by EC-Cache are further amplified in the presence of background network load imbalance.

The genesis of this work traces back to 2013, when I was just wrapping up Sinbad and heard Rashmi Vinayak giving a talk about her work at Facebook. While we planned many times, we were always busy with other stuff and never managed to work together until very recently. During this time, the project evolved from disk-based storage to general object store to its current form: in-memory object cache. The very energetic Jack Kosaian tracked me down last year to work on something exciting, which also helped in finally getting the project going at full speed. I’ve been very fortunate with my collaborators.

This year the OSDI PC accepted 47 out of 260 papers. This happens to be my first time submitting to OSDI. It’s also my first paper with my own student. Jack has been a tremendous force that kept the project going and helped Rashmi proceed at a fast pace. I also want to thank Rashmi for mentoring Jack throughout the process. I love this advising business :) Go Blue!

Received SIGCOMM Doctoral Dissertation Award

About a week or so ago Bruce Maggs, SIGCOMM’s awards chair, kindly informed me over the phone that my dissertation on coflows has been selected for the 2015 ACM SIGCOMM Doctoral Dissertation Award. The committee for the award included Ratul Mahajan, Dina Papagiannaki, Laurent Vanbever (chair), and Minlan Yu, and the citation reads:

Chowdhury’s dissertation provides novel and application-aware networking abstractions which significantly improve the performance of networked applications running in the cloud.

It’s a great honor to join some great researchers who’ve won this award before me.

It goes without saying that this achievement wouldn’t have been possible without my great advisor Ion Stoica keeping me on course, my excellent collaborators over the years, and many others who paved my way in multiple ways. I am also grateful to some of my biggest supporters: Randy Katz, Mike Franklin, Srikanth Kandula, and Vyas Sekar. An equal amount of credit, if not more, goes to my family and friends, who cheered me on and supported me for so many years. I don’t say it often enough: thank you!

CODA Accepted to Appear at SIGCOMM’2016

Update: Camera-ready version is available here now!

Since introducing the coflow abstraction in 2012, we’ve been working hard to make it practical one step at a time. Over the years, we’ve worked on efficient coflow scheduling, removed clairvoyance requirements in coflow scheduling, and performed fair sharing among coexisting coflows. Throughout all these efforts, one requirement remained constant: all distributed data-parallel applications had to be modified to correctly use the same coflow API. Unfortunately, this is difficult, if not impossible, because of the sheer number of computing frameworks available today.

CODA makes a first attempt at addressing this problem head on. The intuition is simple: if we can automatically identify coflows from the network, applications need not be modified. There are at least two key challenges in achieving this goal. The first one is classifying coflows. Unlike traditional traffic classification, coflows cannot be identified using five-tuples. Instead, we must use domain-specific insights to capture the logical relationship between multiple flows. The second challenge is avoiding the impact of misclassification, because even the best classifier will sometimes be wrong. We must be able to schedule coflows even when some flows have been misclassified into wrong coflows. Our solution errs on the side of caution to perform error-tolerant coflow scheduling. Overall, CODA approximates Aalo very closely and significantly outperforms flow-based solutions.

Leveraging application-level requirements using coflows has recently been shown to improve application-level communication performance in data-parallel clusters. However, existing coflow-based solutions rely on modifying applications to extract coflows, making them inapplicable to many practical scenarios. In this paper, we present CODA, a first attempt at automatically identifying and scheduling coflows without any application-level modifications.

We employ an incremental clustering algorithm to perform fast, application-transparent coflow identification and complement it by proposing an error-tolerant coflow scheduler to mitigate occasional identification errors. Testbed experiments and large-scale simulations with realistic production workloads show that CODA can identify coflows with more than 90% accuracy, and CODA’s scheduler is robust to inaccuracies, enabling communication stages to complete 2.4X (5.1X) faster on average (95th percentile) in comparison to per-flow mechanisms. CODA’s performance is comparable to that of solutions requiring application-level modifications.

This work is a tight collaboration with Kai Chen and his students at HKUST: Hong Zhang, Bairen Yi, and Li Chen, which started with Hong sending me a short note asking whether I think the problem is worth addressing. Of course I did; it is the second future work listed in my dissertation!

This year the SIGCOMM PC accepted 39 papers out of 225 submissions with a 17.33% acceptance rate. This also happens to be my first SIGCOMM in an advisory role and the first one without having Ion onboard. I am thankful to Kai for giving me full access to his excellent, hard-working students. It’s been a gratifying experience to work with them, and I look forward to further collaborations. Finally, I’m excited to start my life as an advisor on a high note. Go Blue!

HUG Accepted to Appear at NSDI’2016

Update: Camera-ready version is available here now!

With the advent of cloud computing and datacenter-scale applications, simultaneously dealing with multiple resources is the new norm. When multiple parties have multi-resource demands, fairly dividing these resources  (for some notion of fairness) is a core challenge in the resource allocation literature. Dominant Resource Fairness (DRF) in NSDI’2011 was the first work to point out the ensuing challenges and extended traditional, single-resource max-min fairness to the multi-resource environment.

While DRF guarantees many nice properties to ensure fairness, it gives up work conservation. We show that it can be arbitrarily low in the worst case. Over the past five years, many software and research proposals have built upon DRF under the assumption that DRF can trivially be made work-conserving without losing any of its properties.

After five years, we prove that not to be the case. We show that there is a fundamental tradeoff between guaranteed resources and work conservation, and trivially adding work conservation on DRF can actually decrease utilization instead of increasing it. We also show what is the maximum possible utilization while maintaining DRF’s optimal guarantees and propose a new algorithm, High Utilization with Guarantees (HUG) that generalizes single-and multi-resource max-min fairness as well as multi-tenant network sharing literature under a unifying framework.

In this paper, we study how to optimally provide isolation guarantees in multi-resource environments, such as public clouds, where a tenant’s demands on different resources (links) are correlated in order for her to make progress. Unlike prior work such as Dominant Resource Fairness (DRF) that assumes demands to be static and fixed, we consider elastic demands. Our approach generalizes canonical max-min fairness to the multi-resource setting with correlated demands, and extends DRF to elastic demands. We consider two natural optimization objectives: isolation guarantee from a tenant’s viewpoint and system utilization (work conservation) from an operator’s perspective. We prove that in non-cooperative environments like public cloud networks, there is a strong tradeoff between optimal isolation guarantee and work conservation when demands are elastic. Even worse, work conservation can even decrease network utilization instead of improving it when demands are inelastic. We identify the root cause behind the tradeoff and present a provably optimal allocation algorithm, High Utilization with Guarantees (HUG), to achieve maximum attainable network utilization without sacrificing the optimal isolation guarantee, strategy-proofness, and other useful properties of DRF. In cooperative environments like private datacenter networks, HUG achieves both the optimal isolation guarantee and work conservation. Analyses, simulations, and experiments show that HUG provides better isolation guarantees, higher system utilization, and better tenant-level performance than its counterparts.

This work is the result of a close collaboration with Zhenhua Liu with help from Ion Stoica and Ali Ghodsi.

This year the NSDI PC accepted 45 out of 228 submissions with a 19.74% acceptance rate. This work forms the basis of the last chapter of my dissertation, ensuring SIGCOMM or NSDI publications from all its primary chapters!

Organized AMP Camp 6

Last week, November19-20, I led the AMP Camp 6 organizing team to successfully arrange (most likely) the penultimate AMP Camp before the lab shuts down at the end of 2016. We had 200+ registered attendees, an all-star lineup of 14 speakers, and more than 40 volunteers. We also revamped the AMP Camp exercises syllabus, which now includes some of the more recent components of the Berkeley Data Analytics Stack, i.e., the BDAS (pronounced “badass”) stack.

This is the first time I organized such a large event, and I now have a new-found appreciation for the organizers of many conferences and workshops I had attended in the past. I want to especially thank Kattt Atchley, Jey Kottalam, and Boban Zarkovich for helping me along the way, lab directors Mike Franklin and Ion Stoica for their full support, and all the speakers, volunteers, and attendees for making this event successful.

As I move on to the next stage of my career in a few weeks, I feel proud and fortunate to have been a part of such a stellar research lab.

I Have Multiple Openings for Graduate Students!

If you are already a student in Michigan, drop me an email.

My work focuses on increasing application-infrastructure symbiosis in datacenter-scale, rack-scale, and geo-distributed computing, both to improve today’s data-intensive applications and to enable new data-driven applications on next-generation hardware. I’m looking for graduate students who are interested in systems, networking, and data-intensive computing, and are not afraid to break out of their comfort zones and dig into the applications that run on top. Our ongoing projects are looking at different timescales (microseconds to hours) and contexts (in-machine, in-datacenter, and cross-datacenter) with an overarching goal to make it easier, faster, and cheaper for developers to build new applications and for users to have deeper understandings of their datasets. You can make it happen!

Why CSE@Michigan?

Data-driven decisions dictate most aspects of our lives today. To extract useful pieces of information from an ocean of data, we need to use multiple lenses and enable symbiosis between users, use cases, applications, platforms, and infrastructure. At Michigan, you’ll have access to the world-class faculty in all areas of CSE, ranging from operating systems, networking, and computer architecture to databases, security, and machine learning, as well as the recently established $100M Michigan Institute for Data Science (MIDAS) that has over 100 affiliated faculty across 19 schools and colleges focusing on data science methodology and its application to transportation, learning analytics, personalized health and precision medicine, and computational social science. The opportunity to be able to work at the intersections of multiple areas within and beyond CSE puts Michigan at a uniquely advantageous position to have real-world impacts that few places can replicate.

Since we are talking about data-driven decision making, check out an independent study at http://csrankings.org/ to get a quantitative analysis of why Michigan is one of the best places in the world and how it is only getting better over time.

You should also check out this collection maintained by Prof. Kevin Fu on why Ann Arbor is cool! It did help me in deciding to join Michigan as an assistant professor.

How to Apply

Applications to Michigan CSE Ph.D. program are handled by the admissions committee, and individual faculty members do not make admission decisions. Please submit an application via the Rackham Program Online Application. Remember to select Software as your subplan (major area of study) and express your interest in working with me.

Undergraduate and Master’s Students

If you are an undergraduate or a master’s student at Michigan, please send me an email with your resume and transcripts; I have openings for a couple of motivated students. I’m unable to accept interns who are not enrolled at Michigan at this time.