Tag Archives: Wide-Area Computing

FedScale Accepted to Appear at ICML’2022

Although theoretical federated learning (FL) research is growing exponentially, we are far from putting those theories into practice. Over the course of last few years, SymbioticLab has made significant progress in building deployable FL systems, with Oort being the most prominent example. As I discussed in the past, while evaluating Oort, we observed the weaknesses of the existing FL workloads/benchmarks: they are too small and sometimes too homogeneous to highlight the uncertainties that FL deployments would face in the real world. FedScale was borne out of the necessity to evaluate Oort. As we worked on it, we added more and more datasets to create a diverse benchmark that not only contains workloads to evaluate FL but also traces to emulate real-world end device characteristics. Eventually, we started building a runtime as well that one can use to implement any FL algorithm within FedScale. For example, Oort can be implemented with a few lines in FedScale, or a more recent work PyramidFL in MobiCom’22, which is based on Oort. This ICML paper gives an overview of the benchmarking aspects of FedScale for the ML/FL researchers, while providing a quick intro to the systems runtime that we are continuously working on and plan to publish later this year.

We present FedScale, a diverse set of challenging and realistic benchmark datasets to facilitate scalable, comprehensive, and reproducible federated learning (FL) research. FedScale datasets are large-scale, encompassing a wide range of important FL tasks, such as image classification, object detection, word prediction, speech recognition, and sequence prediction in video streaming. For each dataset, we provide a unified evaluation protocol using realistic data splits and evaluation metrics. To meet the pressing need for reproducing realistic FL at scale, we build an efficient evaluation platform to simplify and standardize the process of FL experimental setup and model evaluation. Our evaluation platform provides flexible APIs to implement new FL algorithms, and includes new execution backends (e.g., mobile backends) with minimal developer efforts. Finally, we perform systematic benchmark experiments on these datasets. Our experiments suggest fruitful opportunities in heterogeneity-aware co-optimizations of the system and statistical efficiency under realistic FL characteristics. FedScale will be open-source and actively maintained, and we welcome feedback and contributions from the community.

Fan and Yinwei had been working on FedScale for more than two years with some help from Xiangfeng toward the end of Oort. During this time, Jiachen and Sanjay joined first as users of FedScale and later as its contributors. Of course, Harsha is with us like all other past FL projects. Including this summer, close to 20 undergrads and master’s students have worked on/with/around it. At this point, FedScale has become the largest project in the SymbioticLab with interests from academic and industry users within and outside Michigan, and there is an active slack channel as well where users from many different institutions collaborate. We are also organizing the first FedScale Summer School this year. Overall, FedScale reminds me of another small project called Spark I was part of many years ago!

This is my/our first paper in ICML or any ML conference for that matter, even though it’s not necessarily a core ML paper. This year, ICML received 5630 submissions. Among these, 1117 were accepted for short and 118 for long presentations with a 21.94% acceptance rate; FedScale is one of the former. These numbers are mind boggling for me as someone from the systems community!

Join us in making FedScale even bigger, better, and more useful, as a member of SymbioticLab or as a FedScale user/contributor. Now that we have the research vehicle, possibilities are limitless. We are exploring maybe less than 10 such ideas, but 100s are waiting for you.

Visit http://fedscale.ai/ to learn more.

Oort Wins the Distinguished Artifact Award at OSDI’2021. Congrats Fan and Xiangfeng!

Oort, our federated learning system for scalable machine learning over millions of edge devices has received the distinguished artifact award at this year’s USENIX OSDI conference!

This is a testament to a lot of hard work put in by Fan and Xiangfeng over the course of last couple years. Oort is our first foray into federated learning, but it certainly is not the last.

Oort and it’s workloads (FedScale) are both open-source at https://github.com/symbioticlab.

FedScale Released on GitHub

Anyone working on federated learning (FL) has faced this problem at least once: you are reading two papers and they either use very different datasets for performance evaluation or unclear about their experimental assumptions about the runtime environment, or both. They often deal with very small datasets as well. There have been attempts at solutions too, creating many FL benchmarks. In the process of working on Oort, we faced the same problem(s). Unfortunately, none of the existing benchmarks fit our requirements. We had to create one on our own.

We present FedScale, a diverse set of challenging and realistic benchmark datasets to facilitate scalable, comprehensive, and reproducible federated learning (FL) research. FedScale datasets are large-scale, encompassing a diverse range of important FL tasks, such as image classification, object detection, language modeling, speech recognition, and reinforcement learning. For each dataset, we provide a unified evaluation protocol using realistic data splits and evaluation metrics. To meet the pressing need for reproducing realistic FL at scale, we have also built an efficient evaluation platform to simplify and standardize the process of FL experimental setup and model evaluation. Our evaluation platform provides flexible APIs to implement new FL algorithms and include new execution backends with minimal developer efforts. Finally, we perform in-depth benchmark experiments on these datasets. Our experiments suggest that FedScale presents significant challenges of heterogeneity-aware co-optimizations of the system and statistical efficiency under realistic FL characteristics, indicating fruitful opportunities for future research. FedScale is open-source with permissive licenses and actively maintained, and we welcome feedback and contributions from the community.

You can read up on the details on our paper and check it out on Github. Do check it out and contribute so that we can together build a large-scale benchmark that considers both data and system heterogeneity across a variety of application domains.

Fan, Yinwei, and Xiangfeng have put in tremendous amount of work over almost two years to get to this point, and I’m super excited about its future.

Oort Accepted to Appear at OSDI’2021

Oort’s working title was Kuiper.

With the wide deployment of AI/ML in our daily lives, the need for data privacy is receiving more attention in recent years. Federated Learning (FL) is an emerging sub-field of machine learning that focuses on in-situ processing of data wherever it is generated. This is only going to become more important as regulations around data movement (e.g., GDPR, CCPA) become even more restrictive. Although there has already been a large number of FL algorithms from the ML community and some FL deployments from large companies, systems support for FL is somewhat non-existent. Oort is our effort in building the first open-source FL system that allows FL developers to select participants for their training in an informed manner instead of selecting them at random. In the process, we have also collected the largest public dataset for FL that we plan to open source in near future.

Federated Learning (FL) is an emerging direction in distributed machine learning (ML) that enables in-situ model training and testing on edge data. Despite having the same end goals as traditional ML, FL executions differ significantly in scale, spanning thousands to millions of participating devices. As a result, data characteristics and device capabilities vary widely across clients. Yet, existing efforts randomly select FL participants, which leads to poor model and system efficiency.

In this paper, we propose Oort to improve the performance of federated training and testing with guided participant selection. With an aim to improve time-to-accuracy performance in model training, Oort prioritizes the use of those clients who have both data that offers the greatest utility in improving model accuracy and the capability to run training quickly. To enable FL developers to interpret their results in model testing, Oort enforces their requirements on the distribution of participant data while improving the duration of federated testing by cherry-picking clients. Our evaluation shows that, compared to existing participant selection mechanisms, Oort improves time-to-accuracy performance by 1.2X-14.1X and final model accuracy by 1.3%-9.8%, while efficiently enforcing developer-specified model testing criteria at the scale of millions of clients.

Fan and Allen had been working on Oort since summer of 2019, and it’s been a great learning experience for me. As always, it’s been a pleasure to collaborate with Harsha, and I look forward to many more in the future. Over the past two years, many others have joined Fan in our efforts toward providing systems support for federated learning and analytics, with many exciting results in different stages in the pipeline and focusing on cloud/edge/WAN challenges. It’s only going to become more exciting!

This is the first OSDI in an odd year as OSDI moves to a yearly cadence. Although the number of submissions is lower than the past, it’s likely only due to the late announcement; being in my first OSDI PC, I think the quality of the submitted and accepted papers remains as high as ever. Overall, the OSDI PC accepted 31 out of 165 submissions.

Sol and Pando Accepted to Appear at NSDI'2020

With the advent of edge analytics and federated learning, the need for distributed computation and storage is only going to increase in coming years. Unfortunately, existing solutions for analytics and machine learning have focused primarily on datacenter environments. When these solutions are applied to wide-area scenarios, their compute efficiency decreases and storage overhead increases. Neither is suitable for pervasive storage and computation throughout the globe. In this iteration of NSDI, we have two papers to address the compute and storage aspects of emerging wide-area computing workloads.

Sol

Sol is a federated execution engine that can execute low-latency computation across a variety of network conditions. The key insight here is that modern execution engines (e.g., those used by Apache Spark or TensorFlow) have implicit assumptions about low-latency and high-bandwidth networks. Consequently, in poor network conditions, the overhead of coordinating work outweigh the work they are trying to execute. The end result, interestingly enough, is CPU underutilization because workers spend more time in waiting for new work to be assigned from the centralized coordinator/master than doing the work. Our solution is API-compatible with Apache Spark so that any existing jobs (SQL, ML, or Streaming) can run on Sol with significant performance improvement in edge computing and federated learning scenarios.

The popularity of big data and AI has led to many optimizations at different layers of distributed computation stacks. Despite – or perhaps, because of – its role as the narrow waist of such software stacks, the design of the execution engine, which is in charge of executing every single task of a job, has mostly remained unchanged. As a result, the execution engines available today are ones primarily designed for low latency and high bandwidth datacenter networks. When either or both of the network assumptions do not hold, CPUs are significantly underutilized.

In this paper, we take a first-principles approach toward developing an execution engine that can adapt to diverse network conditions. Sol, our federated execution engine architecture, flips the status quo in two respects. First, to mitigate the impact of high latency, Sol proactively assigns tasks, but does so judiciously to be resilient to uncertainties. Second, to improve the overall resource utilization, Sol decouples communication from computation internally instead of committing resources to both aspects of a task simultaneously. Our evaluations on EC2 show that, compared to Apache Spark in resource-constrained networks, Sol improves SQL and machine learning jobs by 16.4× and 4.2× on average.

This is Fan’s first major paper, and I’m very proud to see him open his book. I would like to thank Jimmy and Allen for their support in getting it done and Harsha for his enormous amount of time and efforts to make Sol successful. I believe Sol will have significant impact on the emerging fields of edge analytics and federated learning.

Pando

Pando started with a simple idea when Harsha and I were discussing how to apply erasure coding to mutable data. It ended up being so much more. Not only have we designed an erasure-coded state machine, we have also identified the theoretical tradeoff limits for read latency, write latency, and storage overhead. In the process, we show that erasure coding is the way to get closer to the limits that replication-based systems cannot reach. Moreover, Pando can dynamically switch between both depending on the goals it has to achieve.

By replicating data across sites in multiple geographic regions, web services can maximize availability and minimize latency for their users. However, when sacrificing data consistency is not an option, we show that service providers have to today incur significantly higher cost to meet desired latency goals than the lowest cost theoretically feasible. We show that the key to addressing this sub-optimality is to 1) allow for erasure coding, not just replication, of data across data centers, and 2) mitigate the resultant increase in read and write latencies by rethinking how to enable consensus across the wide-area network. Our extensive evaluation mimicking web service deployments on the Azure cloud service shows that we enable near-optimal latency versus cost tradeoffs.

While Muhammed is advised by Harsha, I have had the opportunity to work with him since 2016 (first on a project that failed; on this one, from 2017). Many of the intricacies of the protocol are outside my expertise, but I learned a lot from Harsha and Muhammed. I’m also glad to see that our original idea of mutable erasure-coded data has come to fruition in a much stronger form than what Harsha and I devised as the summer of 2017 was coming to an end. Btw, Pando now has the notorious distinction of my current record for accepted-after-N-submissions; fifth time’s the charm!

The NSDI PC this year accepted 48 out of 275 submissions in the Fall deadline to increase the acceptance rate from the poor showing last year. 

Joint Award With CMU on Distributed Storage. Thanks NSF!

This project aims to build on top of our past and ongoing works with Rashmi Vinayak (CMU) and Harsha Madhyastha (Michigan) to address the optimal performance-cost tradeoffs in distributed storage. It’s always fun to have the opportunity to be able to work with great friends and colleagues.

I’m very grateful to NSF and the broader research community for their great support and show of confidence in our research.