Tag Archives: MapReduce

Dremel: Interactive Analysis of Web-Scale Datasets

October 10, 2011 Mosharaf Leave a comment

Google, "Dremel: Interactive Analysis of Web-Scale Datasets," VLDB, 2010. [PDF]

Summary

Dremel is Google's interactive ad hoc query system for analysis of read-only nested data. Unlike MapReduce, Dremel is aimed toward data exploration, monitoring, and debugging, where near real-time performance is of utmost importance. To achieve scalability and performance, Dremel builds upon three key ideas:

It … Continue Reading ››

Reviews

Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks

October 1, 2011 Mosharaf Leave a comment

Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly, "Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks," EuroSys, 2007. [PDF]

Summary

Dryad is Microsoft's answer to the MapReduce paradigm, albeit at a (slightly) lower level with greater flexibility. Like MapReduce, Dryad allows developers to think about what to do with the data, and Dryad … Continue Reading ››

Reviews

MapReduce: Simplified Data Processing on Large Clusters

September 28, 2011 Mosharaf Leave a comment

Jeffrey Dean, Sanjay Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," OSDI, 2004. [PDF]

Summary

MapReduce is a programming model and associated implementation for processing and generating large data sets in a parallel, fault-tolerant, distributed, and load-balanced manner. There are two main functions (both user provided) in this programming model. The map function takes an input … Continue Reading ››

Spark short paper has been accepted at HotCloud’10

May 8, 2010 Mosharaf 1 Comment

An initial overview of our ongoing work on Spark, an iterative and interactive framework for cluster computing, has been accepted at HotCloud'10. I've been joined the project last February, while Matei has been working on it since last Fall. I will have uploaded the paper in the publications page. once … Continue Reading ››