S. Banerjee, B. Bhattacharjee, C. Kommareddy, “Scalable Application Layer Multicast,” ACM SIGCOMM Conference, (August 2002). [PDF]
Deployment of network layer multicast requires support from the underlying infrastructure. Like many other proposals/protocols with similar requirements, it faced the same fate: no one deployed it. This paper presents a scalable application layer multicast protocol, NICE, designed for low-bandwidth data streaming applications with large receiver sets. Data packets in NICE are replicated at end hosts, which makes it technically an overlay, and so it does not need support from the network providers.
The NICE protocol arranges the set of end hosts into a hierarchy and maintains the hierarchy through different events. The members at the very top of the hierarchy maintain (soft) state about O(log N) other members. Logically each member keeps detailed state about other members that are near (calculated based on end-to-end latency) in the hierarchy and only has limited knowledge about other members in the group. Hosts in each layer are partitioned into a set of clusters of sizes between k and 3k-1, where k is a constant. Each cluster has a leader that is the graph-theoretic center of the cluster. The leaders in layer Li are the members of the layer Li+1, and the same process continues. The following properties hold for the distribution of hosts in different layers:
- A host belongs to a single cluster at any layer.
- If a host is present in layer Li, it must be the cluster leader in each of the lower layers (Li-1…L0).
- If a host is not present in layer Li, it cannot be present in any higher layers (Li+1…).
- Each cluster is size bounded between k and 3k-1.
- There are atmost logk N layers, and the highest layer has only a single member.
There are separate control and data planes in NICE. The neighbors on the control topology exchange periodic soft state refreshes and form a clique in each layer. The data topology in each layer is a source-specific tree.
NICE assumes a special host, rendezvous point (RP), which is known a-priori to all members. When a new host wants to join a multicast group, it contacts the RP. The RP responds by returning the members in the highest layer. The joining node then contacts the closest member, which refers to the members of its cluster in the lower layer (the closest member is here because it is a leader in the lower layer). This process continues iteratively until the joining node finds its appropriate location in the lowest layer. The message overhead of the joining process is O(k log N) query-response pairs. When a node leaves, it removes itself from all the layers and new leaders are chosen. Periodic heartbeat messages are used to keep the state information fresh and refine (split/merge) the hierarchy if necessary.
The authors compare the performance of NICE against the NARADA protocol (which was possibly the best in that time) along the following dimensions: quality of data path (using stretch and stress measures), recovery from host failure, and control traffic overhead. NICE is similar to NARADA in terms of the stretch of data paths and failure recovery, but it has lower stress on routers and links. The most important result is that NICE has orders of magnitude lower control overhead for groups of size > 32, which makes it much more scalable. In addition to simulations, the paper also presents some smaller scale wide are experiments which are similar to the simulation results.
The idea presented in this paper is neat and straightforward, but at the same time obvious. By creating hierarchies, a lot of problems can be solved; but such solutions come at the expense fault tolerance. The authors just assume that the RP won’t fail or even if it fails it will recover quickly, but they avoid the details of what might happen if things go wrong. In short, there is no clear notion of reliability.
NICE is just a glorious overlay. The reasons why application layer multicast should work has also been found to be not-so-correct. Network operators do not like protocols that try to bypass them and mess with their traffic engineering. Eventually, they end up blocking or throttling those protocols. The same might happen to NICE, if anyone was using it. There is no evidence presented in the paper that discusses this aspect of NICE-underlay interaction.