H. Balakrishnan, F. Kaashoek, D. Karger, R. Morris, I. Stoica, “Looking Up Data in P2P Systems,” Communications of the ACM, V. 46, N. 2, (February 2003). [ACM]
P2P systems have attracted large following due to their manageability, flexibility, resource-aggregation capabilities, and fault-tolerance properties. One of the biggest challenges in this area of research is finding a given data item in a large P2P system in a scalable manner. This paper provides a brief overview of the lookup problem, mostly from a DHT-based structured P2P systems’ points of view.
Centralized and hierarchical lookup of data in conventional systems suffer from scalability and resiliency issues. To overcome that, P2P systems developed a notion of symmetric lookup algorithms, which can be primarily categorized into unstructured, semi-structured, and structured mechanisms. DHT-based solutions to the lookup problem are both structured and symmetric and work well in dynamic environments where nodes join and leave the network as well as can randomly fail. In its core, a DHT implements just one function, lookup(key), that finds a data item given its uniquely identifiable key: nodes forward requests they cannot retrieve locally to nodes whose ID is closer to the destination based on some distance function. It also stores keys in a load-balanced way among the participating nodes, before it can actually look them up.
The authors divide DHT-based lookup systems into single- and multi-dimensional implementations and briefly explain three examples. Chord and Pastry use single-dimensional circle and tree structures, respectively; whereas, CAN uses a d-dimensional Cartesian coordinate space. All of them try to route a query to the data node based on a distance metric, as mentioned earlier, and provide different complexities based on space-time trade-offs. The characteristics of the distance function highly affects the workflow of a DHT implementation. For example, being not symmetric, Chord’s distance metric requires it to use a stabilization protocol in the background unlike Pastry, which uses a symmetric distance metric. Several open challenges, circa 2002-03, are listed at the end of the paper. Most, if not all, of them are either completely solved by now or reached their saturation points.