Tag Archives: Programmable Networks

AIFO Accepted to Appear at SIGCOMM’2021

Packet scheduling is a classic problem in networking. In recent years, however, the focus on packet scheduling has somewhat shifted from designing new scheduling algorithms to designing generalized frameworks that can be programmed to approximate a variety of scheduling disciplines. Push-In-First-Out (PIFO) from SIGCOMM 2016 is such a framework that has been shown to be quite expressive. Following that, a variety have solutions have attempted to make implementing PIFO more practical, with the key problem being minimizing the number of priority levels needed. SP-PIFO is a recent take that shows that having a handful queues is enough. This leaves us with an obvious question: what’s the minimum number of queues one needs to approximate PIFO? We show that the answer is just one.

Programmable packet scheduling enables scheduling algorithms to be programmed into the data plane without changing the hardware. Existing proposals either have no hardware implementations or require multiple strict-priority queues.

We present Admission-In First-Out (AIFO) queues, a new solution for programmable packet scheduling that uses only a single first-in first-out queue. AIFO is motivated by the confluence of two recent trends: shallow buffers in switches and fast-converging congestion control in end hosts, that together leads to a simple observation: the decisive factor in a flow’s completion time (FCT) in modern datacenter networks is often which packets are enqueued or dropped, not the ordering they leave the switch. The core idea of AIFO is to maintain a sliding window to track the ranks of recent packets and compute the relative rank of an arriving packet in the window for admission control. Theoretically, we prove that AIFO provides bounded performance to Push-In First-Out (PIFO). Empirically, we fully implement AIFO and evaluate AIFO with a range of real workloads, demonstrating AIFO closely approximates PIFO. Importantly, unlike PIFO, AIFO can run at line rate on existing hardware and use minimal switch resources—as few as a single queue.

Although programmable packet scheduling has been quite popular for more than five years, I started paying careful attention only after the SP-PIFO presentation in NSDI 2020. I felt that we should be able to approximate something like that with even fewer priority classes, especially by using something similar to Foreground-Background scheduling that needs only two priorities. Xin had been thinking about the problem even longer given his vast experience in programmable switches and approached me after submitting Kayak to NSDI 2021. Xin pointed out that two priorities need only queue with an admission control mechanism in front! I’m glad he roped me in as it’s always a pleasure working with him and Zhuolong. It seems unbelievable even to me that this is my first packet scheduling paper!

This year SIGCOMM has broken the acceptance record once again by accepting 55 out of 241 submissions into the program!

Presented Keynote Talk at CloudNet’2020

Earlier this week, I presented a keynote talk on the state of network-informed data systems design at the CloudNet’2020 conference, with a specific focus on our recent works on memory disaggregation (Infiniswap, Leap, and NetLock), and discussed the many open challenges toward making memory disaggregation practical.

In this talk, I discussed the motivation behind disaggregating memory, or any other expensive resource for that matter. High-performance data systems strive to keep data in the main memory. They often over-provision to avoid running out of memory, leading to a 50% average underutilization in Google, Facebook, and Alibaba datacenters. The root cause is simple: applications today cannot access otherwise unused memory beyond their machine boundaries even when their performance grinds to a halt. But could they? Over the course of last five years, our research in the SymbioticLab have addressed and continue to address this fundamental question regarding memory disaggregation, whereby an application can leverage both local and remote memory by leveraging emerging high-speed networks.

I also highlighted at least eight major challenges any memory disaggregation solution must address to even have a shot of becoming practical and widely used. These include, applicability to a large variety of applications by not requiring any application-level changes; scalability to large datacenters; efficiency is using up all available memory; high performance when using disaggregated memory; performance isolation from other datacenter traffic; resilience in the presence of failures and unavailability; security from others; and generality to a variety of memory technologies beyond DRAM. While this may come across as a laundry list of problems, we do believe that a complete solution must address each one of them.

In this context, I discussed three projects: Infiniswap, which achieves applicability using remote memory paging, and scalability and efficiency using decentralized algorithms; Leap, which improves performance by prefetching; and NetLock, which shows how to disaggregate programmable switch memory. I also pointed out a variety of ongoing projects toward our ultimate goal of unified, practical memory disaggregation.

My slides from this talk are publicly available and have more details elaborating these points.

NetLock Accepted to Appear at SIGCOMM’2020

High-throughput, low-latency lock managers are useful for building a variety of distributed applications. Traditionally, a key tradeoff in this context had been expressed in terms of the amount of knowledge available to the lock manager. On the one hand, a decentralized lock manager can increase throughput by parallelization, but it can starve certain categories of applications. On the other hand, a centralized lock manager can avoid starvation and impose resource sharing policies, but it can be limited in throughput. In SIGMOD’18, we presented DSLR that attempted to mitigate this tradeoff in clusters with fast RDMA networks by adapting Lamport’s bakery algorithm in the context of RDMA’s fetch-and-add (FA) operations to design a decentralized solution. The downside is that we couldn’t implement complex policies that need centralized information.

What if we could have a high-speed centralized point that all remote traffic must go through anyway? NetLock is our attempt at doing just that by implementing a centralized lock manager in a programmable switch by working at tandem with the servers. The co-design is important to go around the resource limitations of the switch. By carefully caching hot locks and moving warm and cold ones to the servers, we can meet both the performance and policy goals of a lock manager without significant compromise in either.

Lock managers are widely used by distributed systems. Traditional centralized lock managers can easily support policies between multiple users using global knowledge, but they suffer from low performance. In contrast, emerging decentralized approaches are faster but cannot provide flexible policy support. Furthermore, performance in both cases is limited by the server capability.

We present NetLock, a new centralized lock manager that co-designs servers and network switches to achieve high performance without sacrificing flexibility in policy support. The key idea of NetLock is to exploit the capability of emerging programmable switches to directly process lock requests in the switch data plane. Due to the limited switch memory, we design a memory management mechanism to seamlessly integrate the switch and server memory. To realize the locking functionality in the switch, we design a custom data plane module that efficiently pools multiple register arrays together to maximize memory utilization We have implemented a NetLock prototype with a Barefoot Tofino switch and a cluster of commodity servers. Evaluation results show that NetLock improves the throughput by 14.0–18.4×, and reduces the average and 99% latency by 4.7–20.3× and 10.4–18.7× over DSLR, a state-of-the-art RDMA-based solution, while providing flexible policy support.

Xin and I came up with the idea of this project over a couple meals in San Diego at OSDI’18, and later Zhuolong and Yiwen expanded and successfully executed our ideas that lead to NetLock. Similar to DSLR, NetLock explores a different design point in our larger memory disaggregation vision.

This year’s SIGCOMM probably has the highest acceptance rate in 25 years, if not more. After a long successful run at SIGCOMM and a small break doing many other exciting things, it’s great to be back to some networking research! But going forward, I’m hoping for much more along these lines both inside the network and at the edge.