Multi-framework resource managers for datacenters

AMPLab, “Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center,” NSDI, 2011. [PDF]

Apache Software Foundation, “Hadoop NextGen”, 2011. [LINK]

Summary

Traditional cluster resource schedulers fall into two broad categories: some do fine-grained management of resources for individual frameworks (e.g., in Hadoop), but this requires multiple frameworks to run on multiple isolated clusters. Some others perform course-grained resource management across multiple frameworks at the cost of underutilization (e.g., MPI schedulers). However, fine-grained sharing of cluster resources across multiple, possibly diverse, data- and compute-intensive frameworks is important for several reasons: better utilization and multiplexing of resources, ease of cluster management, and faster innovation without worrying about underlying physical resources. Mesos and Hadoop NextGen aim to achieve just that.

Without subscribing to either approach’s terminology, a typical resource manager has a central coordinator that keeps track of all the resources in the cluster by periodically communicating with its daemons in individual machines. Instead of interfacing to actual physical resources, frameworks now use a library provided by the resource manager to interact with the coordinator. Once a framework expresses its requirements and later accepts some, it’s on its own to schedule those resources among its workers.

Mesos vs Hadoop NextGen

The primary and possibly the only major difference between Mesos (that came earlier) and Hadoop NextGen (that spun out from the basic Hadoop framework) is the way the coordinator and frameworks interact while expressing and accepting (or rejecting) resources. Mesos provides resource offers to individual frameworks that can then accept or reject them. Consequently, resource allocation becomes a distributed problem, and Mesos itself remains minimal. Hadoop NextGen, on the contrary, requires each framework to explicitly express their requirements and then runs a centralized algorithm to allocate resources.

Comments

Both resource managers are pretty much the same. May be I am biased as an AMPLab member, but it seems that Hadoop NextGen design was highly influenced by Mesos. In either case, the central coordinator can become the bottleneck. But with increasing cluster size, Mesos’ approach is likely to scale more than that of Hadoop NextGen due to Mesos’ distributed approach. Given Hadoop’s popularity, however, Hadoop NextGen is likely to become more widespread than Mesos.

 

Leave a Reply

Your email address will not be published. Required fields are marked *