Dynamo: Amazon’s Highly Available Key-value Store

Amazon, “Dynamo: Amazon’s Highly Available Key-value Store,” SOSP, 2007. [PDF]

Summary

Dynamo is a highly available (99.9th percentile) key-value storage mechanism that sacrifices traditional consistency models for eventual consistency to achieve availability. Dynamo works with a simple query model, where read/write (get() and put()) operations are performed on data items uniquely identified by their keys. There is no relational schema because there are no operations that span multiple data items. Stored objects in Dynamo are relatively small (less than 1MB).

Dynamo defers conflict resolutions due to incompatible versions of data during read operations and allows write operations on any data all the time. It moves conflict resolutions to applications, since applications have more semantic information about the data they are using and can more effectively resolve conflicts.

Dynamo uses distributed hashing to partition data on a ring structure and replicates data among N-1 clockwise successor nodes on that ring. However, unlike traditional DHTs, it uses zero-hop routing, i.e., it does not route through other nodes on the ring, in order to reduce latency. For any updates, data are versioned using vector clocks and conflicts as well as temporary failures are resolved using sloppy quorums and hinted handoffs. To handle permanent failures, Dynamo uses Merkle-tree based anty-entropy protocol to keep replicas in sync. Failures are detected and taken care off in local context using gossip protocols. Since it is used in a controlled environment, there are no strong security requirements.

Comments

All in all, Dynamo creates a great mix of well-known protocols and techniques to achieve high availability. One very interesting tradeoff here is for not missing any writes, even writing twice, at the expense of being in possibly inconsistent states. It is also a great example how far business goals can define a system design; to paraphrase the authors: “additions to shopping carts are never lost, however, deleted items can resurface!”

Versioning capability of Dynamo is limited by the size of the vector clocks it can use. The authors do mention a truncating algorithm, but they avoid the problem of what will happen if Dynamo requires larger clock. Unfortunately, if it fails to support the required length of vector lengths, the whole eventual consistency setup will break down.

Dynamo is the poster child of eventual consistency and noSQL admirers. Its influence on the open-source community is undeniable: Cassandra and Voldemort are two prominent examples of Dynamo-inspired systems.

Leave a Reply

Your email address will not be published. Required fields are marked *