Sol and Pando Accepted to Appear at NSDI'2020

With the advent of edge analytics and federated learning, the need for distributed computation and storage is only going to increase in coming years. Unfortunately, existing solutions for analytics and machine learning have focused primarily on datacenter environments. When these solutions are applied to wide-area scenarios, their compute efficiency decreases and storage overhead increases. Neither is suitable for pervasive storage and computation throughout the globe. In this iteration of NSDI, we have two papers to address the compute and storage aspects of emerging wide-area computing workloads.

Sol

Sol is a federated execution engine that can execute low-latency computation across a variety of network conditions. The key insight here is that modern execution engines (e.g., those used by Apache Spark or TensorFlow) have implicit assumptions about low-latency and high-bandwidth networks. Consequently, in poor network conditions, the overhead of coordinating work outweigh the work they are trying to execute. The end result, interestingly enough, is CPU underutilization because workers spend more time in waiting for new work to be assigned from the centralized coordinator/master than doing the work. Our solution is API-compatible with Apache Spark so that any existing jobs (SQL, ML, or Streaming) can run on Sol with significant performance improvement in edge computing and federated learning scenarios.

The popularity of big data and AI has led to many optimizations at different layers of distributed computation stacks. Despite – or perhaps, because of – its role as the narrow waist of such software stacks, the design of the execution engine, which is in charge of executing every single task of a job, has mostly remained unchanged. As a result, the execution engines available today are ones primarily designed for low latency and high bandwidth datacenter networks. When either or both of the network assumptions do not hold, CPUs are significantly underutilized.

In this paper, we take a first-principles approach toward developing an execution engine that can adapt to diverse network conditions. Sol, our federated execution engine architecture, flips the status quo in two respects. First, to mitigate the impact of high latency, Sol proactively assigns tasks, but does so judiciously to be resilient to uncertainties. Second, to improve the overall resource utilization, Sol decouples communication from computation internally instead of committing resources to both aspects of a task simultaneously. Our evaluations on EC2 show that, compared to Apache Spark in resource-constrained networks, Sol improves SQL and machine learning jobs by 16.4× and 4.2× on average.

This is Fan’s first major paper, and I’m very proud to see him open his book. I would like to thank Jimmy and Allen for their support in getting it done and Harsha for his enormous amount of time and efforts to make Sol successful. I believe Sol will have significant impact on the emerging fields of edge analytics and federated learning.

Pando

Pando started with a simple idea when Harsha and I were discussing how to apply erasure coding to mutable data. It ended up being so much more. Not only have we designed an erasure-coded state machine, we have also identified the theoretical tradeoff limits for read latency, write latency, and storage overhead. In the process, we show that erasure coding is the way to get closer to the limits that replication-based systems cannot reach. Moreover, Pando can dynamically switch between both depending on the goals it has to achieve.

By replicating data across sites in multiple geographic regions, web services can maximize availability and minimize latency for their users. However, when sacrificing data consistency is not an option, we show that service providers have to today incur significantly higher cost to meet desired latency goals than the lowest cost theoretically feasible. We show that the key to addressing this sub-optimality is to 1) allow for erasure coding, not just replication, of data across data centers, and 2) mitigate the resultant increase in read and write latencies by rethinking how to enable consensus across the wide-area network. Our extensive evaluation mimicking web service deployments on the Azure cloud service shows that we enable near-optimal latency versus cost tradeoffs.

While Muhammed is advised by Harsha, I have had the opportunity to work with him since 2016 (first on a project that failed; on this one, from 2017). Many of the intricacies of the protocol are outside my expertise, but I learned a lot from Harsha and Muhammed. I’m also glad to see that our original idea of mutable erasure-coded data has come to fruition in a much stronger form than what Harsha and I devised as the summer of 2017 was coming to an end. Btw, Pando now has the notorious distinction of my current record for accepted-after-N-submissions; fifth time’s the charm!

The NSDI PC this year accepted 48 out of 275 submissions in the Fall deadline to increase the acceptance rate from the poor showing last year.