High-level platforms on top of Hadoop

Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins, "Pig Latin: A Not-So-Foreign Language for Data Processing," SIGMOD, 2008. [PDF]

Facebook Data Team, "Hive: Data Warehousing and Analytics on Hadoop," . [LINK]

Summary

Pig and Hive are higher level programming interfaces to Hadoop with corresponding data management tools and related optimizations developed by … Continue Reading ››

Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks

Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly, "Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks," EuroSys, 2007. [PDF]

Summary

Dryad is Microsoft's answer to the MapReduce paradigm, albeit at a (slightly) lower level with greater flexibility. Like MapReduce, Dryad allows developers to think about what to do with the data, and Dryad … Continue Reading ››