Tag Archives: RDMA

Infiniswap in USENIX ;login: and Elsewhere

Since our first open-source release of Infiniswap over the summer, we have seen growing interest with many follow-ups within our group and outside.

Here is a quick summary of selected writeups on Infiniswap:

What’s Next?

In addition to a lot of performance optimizations and bug fixes, we’re working on several key features of Infiniswap that will be released in coming months.

  1. High Availability: Infiniswap will be able to maintain high performance across failures, corruptions, and interruptions throughout the cluster.
  2. Performance Isolation: Infiniswap will be able to provide end-to-end performance guarantees over the RDMA network to multiple applications concurrently using it.

Infiniswap Released on GitHub

Today we are glad to announce the first open-source release of Infiniswap, the first practical, large-scale memory disaggregation system for cloud and HPC clusters.

Infiniswap is an efficient memory disaggregation system designed specifically for clusters with fast RDMA networks. It opportunistically harvests and transparently exposes unused cluster memory to unmodified applications by dividing the swap space of each machine into many slabs and distributing them across many machines’ remote memory. Because one-sided RDMA operations bypass remote CPUs, Infiniswap leverages the power of many choices to perform decentralized slab placements and evictions.

Extensive benchmarks on workloads from memory-intensive applications ranging from in-memory databases such as VoltDB and Memcached to popular big data software Apache Spark, PowerGraph, and GraphX show that Infiniswap provides order-of-magnitude performance improvements when working sets do not completely fit in memory. Simultaneously, it boosts cluster memory utilization by almost 50%.

Primary features included in this initial release are:

  • No new hardware and no application or operating system modifications;
  • Fault tolerance via asynchronous disk backups;
  • Scalability via decentralized algorithms.

Here are some links, if you want to check it out, contribute, or just want to point someone else who can help us to make it better.

Git repository: https://github.com/Infiniswap/infiniswap
Detailed Overview: Efficient Memory Disaggregation with Infiniswap

The project is still in its early stage and can use all the help to become successful. We appreciate your feedback.

FaiRDMA Accepted to Appear at KBNets’2017

As cloud providers deploy RDMA in their datacenters and developers rewrite/update their applications to use RDMA primitives, a key question remains open: what will happen when multiple RDMA-enabled applications must share the network? Surprisingly, this simple question does not yet have a conclusive answer. This is because existing work focus primarily on improving individual application’s performance using RDMA on a case-by-case basis. So we took a simple step. Yiwen built a benchmarking tool and ran some controlled experiments to find some very interesting results: there is often no performance isolation, and there are many factors that determine RDMA performance beyond the network itself. We are currently investigating the root causes and working on mitigating them.

To meet the increasing throughput and latency demands of modern applications, many operators are rapidly deploying RDMA in their datacenters. At the same time, developers are re-designing their software to take advantage of RDMA’s benefits for individual applications. However, when it comes to RDMA’s performance, many simple questions remain open.

In this paper, we consider the performance isolation characteristics of RDMA. Specifically, we conduct three sets of experiments — three combinations of one throughput-sensitive flow and one latency-sensitive flow — in a controlled environment, observe large discrepancies in RDMA performance with and without the presence of a competing flow, and describe our progress in identifying plausible root-causes.

This work is an offshoot, among several others, of our Infiniswap project on rack-scale memory disaggregation. As time goes by, I’m appreciating more and more Ion’s words of wisdom on building real systems to find real problems.

FTR, the KBNets PC accepted 9 papers out of 22 submissions this year. For a first-time workshop, I consider it a decent rate; there are some great papers in the program from both the industry and academia.

Infiniswap Accepted to Appear at NSDI’2017

Update: Camera-ready version is available here. Infiniswap code is now on GitHub!

As networks become faster, the difference between remote and local resources is blurring everyday. How can we take advantage of these blurred lines? This is the key observation behind resource disaggregation and, to some extent, rack-scale computing. In this paper, we take our first stab at making memory disaggregation practical by exposing remote memory to unmodified applications. While there have been several proposals and feasibility studies in recent years, to the best of our knowledge, this is the first concrete step in making it real.

Memory-intensive applications suffer large performance loss when their working sets do not fully fit in memory. Yet, they cannot leverage otherwise unused remote memory when paging out to disks even in the presence of large imbalance in memory utilizations across a cluster. Existing proposals for memory disaggregation call for new architectures, new hardware designs, and/or new programming models, making them infeasible.

This paper describes the design and implementation of Infiniswap, a remote memory paging system designed specifically for an RDMA network. Infiniswap opportunistically harvests and transparently exposes unused memory to unmodified applications by dividing the swap space of each machine into many slabs and distributing them across many machines’ remote memory. Because RDMA operations bypass remote CPUs, Infiniswap leverages the power of many choices to perform decentralized slab placements and evictions.

We have implemented and deployed Infiniswap on an RDMA cluster without any OS modifications and evaluated its effectiveness using multiple workloads running on unmodified VoltDB, Memcached, PowerGraph, GraphX, and Apache Spark. Using Infiniswap, throughputs of these applications improve between 7.1X (0.98X) to 16.3X (9.3X) over disk (Mellanox nbdX), and median and tail latencies between 5.5X (2X) and 58X (2.2X). Infiniswap does so with negligible remote CPU usage, whereas nbdX becomes CPU-bound. Infiniswap increases the overall memory utilization of a cluster and works well at scale.

This work started as a class project for EECS 582 in the Winter when I gave the idea to Juncheng Gu and Youngmoon Lee, who made the pieces into a whole. Over the summer, Yiwen Zhang, an enterprising and excellent undergraduate, joined the project and helped us in getting it done within time.

This year the NSDI PC accepted 46 out of 255 papers. This happens to be my first paper with an all-blue cast! I want to thank Kang for giving me complete access to Juncheng and Youngmoon; it’s been great collaborating with them. I’m also glad that Yiwen has decided to start a Master’s and stay with us for longer, and more importantly, our team will remain intact for many more exciting followups in this emerging research area.

If this excites you, come join our group!