Tag Archives: Wide-Area Computing

Near Optimal Coflow Scheduling Accepted at SPAA’2019

Since the inception of coflow in 2012, the abstraction and works surrounding it are growing at a fast pace. In addition to systems building, we have seen a rise of theoretical analyses of the coflow scheduling problem. One of the most recent ones to this end has even received a Best Student Paper Award in SIGCOMM’2018.

Today I want to share a recent development in theoretical analysis of coflow scheduling in the context of general graphs. While datacenter networks can be abstracted out as a non-blocking switch, under failures and load imbalance this simplistic model does not hold any more. More importantly, inter-datacenter wide-area networks (WANs) are never non-blocking anyway. Naturally, we need a new analytical framework that allows for a broader variety of network topologies. In our recent work with our colleagues at University of Maryland and Google, we provide a randomized 2-approximation algorithm for single path and free path models of coflow scheduling over the WAN, significantly improving over the state-of-the-art.

The Coflow scheduling problem has emerged as a popular abstraction in the last few years to study data communication problems within a data center [5]. In this basic framework, each coflow has a set of communication demands and the goal is to schedule many coflows in a manner that minimizes the total weighted completion time. A coflow is said to complete when all its communication needs are met. This problem has been extremely well studied for the case of complete bipartite graphs that model a data center with full bisection bandwidth and several approximation algorithms and effective heuristics have been proposed recently [1, 2, 29].

In this work, we study a slightly different model of coflow scheduling in general graphs (to capture traffic between datacenters [15, 29]) and develop practical and efficient approximation algorithms for it. Our main result is a randomized 2 approximation algorithm for the single path and free path model, significantly improving prior work. In addition, we demonstrate via extensive experiments that the algorithm is practical, easy to implement and performs well in practice.

This is my first paper in SPAA, but all thanks go to Sheng Yang at UMD whose dissertation will include this work and his advisor Samir and collaborator Manish at Google. My student Jie You (Jimmy) did the evaluation for this work to compare it against existing works on different WAN topologies and workloads. It was a pleasant experience with the theory folks and helping them in better understanding the constraints. I look forward to many more future collaborations!

CellScope Accepted to Appear at MobiCom’2018

Today’s cellular Radio Access Networks (RAN) are vast, and operators collect a variety of data to better understand and manage these networks. A fundamental challenge in any geo-distributed system — including cellular RANs — is the tradeoff between data collection latency and the accuracy of insights derived from the collected data. While the general problem is difficult to address, one can side-step some of the challenges by leveraging domain-specific knowledge. That’s exactly what we achieve in CellScope in the context of cellular RANs through (1) intelligent data grouping and (2) task formulations that leverage domain characteristics. Specifically, we enable multi-task learning for performance analysis of cellular RANs using the aforementioned techniques.

An increasing amount of mobile analytics is performed on data that is procured from mobile devices in a real-time fashion to make real-time decisions. Such tasks include simple reporting on streams to sophisticated model building. However, the practicality of such analyses are impeded in several domains because they are faced with a fundamental trade-off between data collection latency and analysis accuracy.

In this paper, we first study this trade-off in the context of a specific domain, Cellular Radio Access Networks (RAN). We find that the trade-off can be resolved using two broad, general techniques: intelligent data grouping and task formulations that leverage domain characteristics. Based on this, we present CellScope, a system that applies a domain specific formulation and application of Multi- task Learning (MTL) to RAN performance analysis. It uses three techniques: feature engineering to transform raw data into effective features, a PCA inspired similarity metric to group data from geographically nearby base stations sharing performance commonalities, and a hybrid online-offline model for efficient model updates. Our evaluation shows that its accuracy improvements over direct application of ML range from 2.5× to 4.4× while reducing the model update overhead by up to 4.8×. We have also used it to analyze an LTE network of over 2 million subscribers, where it reduced troubleshooting efforts by several magnitudes.

We then apply the underlying techniques in CellScope to another domain, mobile phone energy bug diagnosis, and show that the techniques are general.

I got involved in this project about 3 years ago right before I was about to start in Michigan. The project faced many challenges, but it survived and saw the light of day only because of Anand’s persistence. This is my first MobiCom paper, and I’m pleasantly surprised by the level of rigorous reviews the paper went through — 9 reviews + 1 blind shepherding process! This fits right into our efforts toward geo-distributed analytics with many more exciting results to come soon. Stay tuned!

Three Papers Accepted at HotCloud’2018 and GRADES-NDA’2018 Workshops

Over the course of last few days, we heard back about the acceptance of three workshops papers:

  1. To Relay or Not to Relay for Inter-Cloud Transfers? — HotCloud’18
  2. Monarch: Gaining Command on Geo-Distributed Graph Analytics — HotCloud’18
  3. Bridging the GAP: Towards Approximate Graph Analytics — GRADES-NDA’18

The HotCloud ones deal with networking for and graph processing over geo-distributed analytics, while the GRADES-NDA one deals with approximate graph processing. The first of the three also happens to be the first submission and paper by my first-year student Fan Lai in collaboration with Harsha. The other two are led completely by my friend Anand at Berkeley.

Received Two Alibaba Innovation Research Grants

More resources for following up on our recent memory disaggregation and erasure coding works! One of the awards is a collaboration with Harsha Madhyastha. Looking forward to working with Alibaba.

In 2017, many proposals are received by AIR (Alibaba Innovation Research), which are from 99 universities and institutes (domestic 54; overseas 45) in 13 countries and regions. After rigorous review by AIR Technical Committee, 43 distinguish research proposals are accepted and will be funded by AIR Program.