Yahoo! Research, “PNUTS: Yahoo!’s Hosted Data Serving Platform,” PVLDB, 2008. [PDF]
PNUTS is a scalable, highly available, and geographically distributed (but low latency) data store used by most Yahoo! online properties. To achieve both availability and partition tolerance, it uses a novel notion of consistency called per-record timeline consistency; under this model, all replicas of a given record apply all updates to the record in the same order. However, it is applicable to only a single record, and hence, it is not suitable for transactions involving multiple records without considerable application logic. PNUTS also allows its users to switch to eventual consistency, which is easier to maintain and often acceptable for many online services.
The PNUTS system is divided into regions, where each region contains a full complement of system components and a complete copy of each table. Data tables can be (horizontally) ordered or hash partitioned into groups of records called tablets. Each tablet is stored in a single server within a region. To read or write an update, the router first determines which tablet contains the record, and which server hosts that tablet. Routers contain only a cached copy of the mapping, which is owned by the tablet controller. Routers periodically poll the tablet controller to get any changes to the mapping.
Underneath it all, Yahoo! Message Broker or YMB is a pub/sub system that simultaneously acts as a write-ahead log to allow committing of updates and asynchronously replicates committed data to subscribers in other regions. A message is not purged from the YMB log until PNUTS has verified that the update is applied to all replicas of the database. YMB provides partial ordering of published messages, meaning that messages published to a particular YMB cluster will be delivered to all subscribers in the order they were published; however, messages published to different YMB clusters may be delivered in any order. In order to provide record-level timeline consistency, PNUTS implements record-level mastering: each record has a master copy in some cluster from where message propagation must start. Different records of the same table can have masters in different regions.
PNUTS tradeoffs power to achieve simplicity in design by offloading more complex tasks to application designers. Providing support for ordered tables is one of the key contributions of PNUTS, which allows it to implement efficient range operations. The notion of per-record timeline consistency is interesting, but it cannot support real transactions (the authors argue, however, that transactions involving multiple records are not that common). PNUTS also does not support complex queries.