Memory Management in the Cloud

Stanford, “The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM,” SIGOPS Operating Systems Review, Vol. 43, No. 4, December 2009, pp. 92-105. [PDF]

AMP Lab, “PACMan: Coordinated Memory Caching for Parallel Jobs,” Secret Draft.

Update: PACMan has been accepted at NSDI’2012. Secret draft won’t remain secret anymore :)

Summary

Cloud applications require storage systems that provide low latency and high throughput for large amounts of data.  While traditional disks cannot meet such requirements, given the trend in DRAM price and capacity, it is possible to envision a future where most of the storage needs can be fulfilled by DRAM; RAMCloud is such a system. PACMan, on the other hand, suggests that even today, most of the workloads can be kept into DRAM using better caching mechanisms.

RAMCloud

The core idea in RAMCloud is to keep everything in DRAM with disks used only as backups. The biggest challenge is to make sure that the storage system can be recovered quickly upon failure. RAMCloud uses buffered logging. The authors claim that replication is not necessary to achieve high performance, rather replicas are used only for parallel recovery. In steady state, there is a single copy of the data present in DRAM. Recovery is performed using a massively parallel read of data from disks.

PACMan

PACMan is a caching mechanism and corresponding system for HDFS and similar distributed file systems. The key idea is that current clusters have a large amount of unused memory that can be used to cache frequently-used data blocks, and traditional caching strategies like LRU or LFU do not work well on cluster jobs. The authors propose the concept of all-or-nothing property, i.e., when caching all data blocks for a given job across the cluster should be cached or nothing at all.

Comments

RAMCloud is a more general system than PACMan, but clearly, it is more expensive as well. RAMCloud trades off price for speed, but it is likely to be used in many future systems if prices of DRAM and high-speed network equipments keep going down. PACMan, from the high level, may seem to be a more short-term fix for the existing clusters. However, the insight of all-or-nothing is important and will be useful even in the future. Also, PACMan can have a quicker impact because it does not ask for any investment to reap the possible gains.

Leave a Reply

Your email address will not be published. Required fields are marked *