I. Stoica, R. Morris, D. Karger, F. Kaashoek, H. Balakrishnan, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications,” ACM SIGCOMM Conference, 2001. [PDF]
Chord is a DHT-based distributed lookup protocol – one of too many actually – for P2P systems that looks up a value given a key. What makes it more interesting than its counterparts is specially its simplicity, then logarithmic space and message-passing requirements, and its resiliency under high churn and failure scenarios. Chord strives for load balancing through consistent hashing, decentralization and scalability inherently required for P2P systems, availability during high churn and failure, and flexible naming.
Nodes and keys are distributed on an identifier circle in Chord, where identifiers are derived using consistent hashing. Consistent hashing assigns a key to a node (successor node) whose identifier is equal to or follows the identifier of that key in the identifier space. For correctness, each node only needs to maintain address of/connection to its successor in the identifier circle, at the very least. A lookup for a key is performed by sequentially following successor nodes until it reaches the node that holds the value for that key. Note that, the goal at every step is to reduce the distance toward the final node on the identifier circle.
For faster distance reduction, i.e., for faster search, each node in Chord also maintains a skip list, aka a finger table. Each finger table has some entries, where consecutive entries hold addresses of nodes farther and farther away on the identifier circle. The authors propose storing O(log N) entries with ith entry holding the address of the node with distance 2i or the immediately next one if there is no node in exactly 2i distance.
To join Chord, a joining node gets its successor node by asking a known bootstrapping node. Once it has the successor node, it runs a stabilization mechanism (which is actually run periodically by every node) that notifies its successor and predecessor nodes to update their successor and predecessor pointers. This ensures Chord’s correctness. Lazily, in the background, everyone updates their finger tables taking advantage of possible optimizations.
To ensure correctness and resilience under failures, Chord proposes using multiple successors instead of just one. As long as there is a single successor alive, Chord will be able to find all the keys. Note that the keys that have values stored on failed nodes cannot be restored. To avoid that, the authors propose a similar mechanism where values are replicated on multiple successive nodes. In general, they resort to the age-old systems tradition of “make multiple copies when you can’t take failures.”
This is most probably the most cited systems paper in recent history, maybe even in whole CS, and rightly so. The beauty of this paper is its simplicity. There are hardly any paper in P2P that does not cite it, and any work that has to use some DHT search end up using Chord because its so damn easy to code and use (at least it seemed so).
It was not clear what the authors wanted to do with the circle partitioning scenario or how stabilization protocol is going to handle that. The notion of a higher layer application entity is not well explained. When it does something and when it does not is not clear in multiple scenarios. Was there any dire need to introduce that entity in this paper? In the evaluation section, the high churn scenario starts with only 500 nodes in the simulation! This significantly lower than other settings (with at least 104 nodes) and questions the reliability of the findings. Its sure though, after this paper several tens of papers came out dissecting its tiniest of details; so the churn problem is most probably taken care of by now.