Reviews – Mosharaf Chowdhury

Memory Management in the Cloud

Mosharaf — Mon, 05 Dec 2011 18:13:50 +0000

Stanford, "The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM," SIGOPS Operating Systems Review, Vol. 43, No. 4, December 2009, pp. 92-105. [PDF]

AMP Lab, "PACMan: Coordinated Memory Caching for Parallel Jobs," Secret Draft.

Update: PACMan has been accepted at NSDI'2012. Secret draft won't remain secret anymore :)

Summary

Cloud applications require storage systems … Continue Reading ››

Confidentiality and Security in the Cloud

Mosharaf — Mon, 28 Nov 2011 01:09:33 +0000

Raluca Ada Popa, Catherine M. S. Redﬁeld, Nickolai Zeldovich, Hari Balakrishnan, "CryptDB: Protecting Conﬁdentiality with Encrypted Query Processing," SOSP, 2011. [PDF]

Thomas Ristenpart, Eran Tromer, Hovav Shacham, Stefan Savage, "Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds," CCS, 2009. [PDF]

Summary

With the increase in popularity of cloud … Continue Reading ››

Graph-parallel frameworks

Mosharaf — Sat, 19 Nov 2011 03:59:29 +0000

Google, "Pregel: A System for Large-Scale Graph Processing," SIGMOD, 2010. [PDF]

Carnegie Mellon, "GraphLab: A New Framework for Parallel Machine Learning," arXiv:1006.4990, 2010. [PDF]

Summary

Data-parallel frameworks such as MapReduce and Dryad are good at performing embarrassingly parallel jobs. These frameworks are not ideal for iterative jobs and for jobs where data-dependencies across stages … Continue Reading ››

Datacenter transport layer protocols

Mosharaf — Tue, 15 Nov 2011 04:03:15 +0000

Stanford and Microsoft, "DCTCP: Efﬁcient Packet Transport for the Commoditized Data Center," SIGCOMM, 2010. [PDF]

Raiciu et al, "Improving Datacenter Performance and Robustness with Multipath TCP," SIGCOMM, 2011. [PDF]

MSR Asia, ICTCP: Incast Congestion Control for TCP in Data Center Networks," CoNEXT, 2010. [PDF]

Summary

Datacenters pose a different set of challenges than … Continue Reading ››

Cloudy operating systems

Mosharaf — Mon, 07 Nov 2011 08:43:15 +0000

MIT, An Operating System for Multicore and Clouds: Mechanisms and Implementation," SOCC, 2010. [PDF]

Barret Rhoden, Kevin Klues, David (Yu) Zhu, Eric Brewer, "Improving Per-Node Efficiency in the Datacenter with New OS Abstractions," SOCC, 2011. [PDF]

Summary

Factored Operating System

The Factored Operating System (FOS) proposes an OS architecture where each core runs individual microkernels … Continue Reading ››

Multi-framework resource managers for datacenters

Mosharaf — Wed, 02 Nov 2011 02:49:41 +0000

AMPLab, "Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center," NSDI, 2011. [PDF]

Apache Software Foundation, "Hadoop NextGen", 2011. [LINK]

Summary

Traditional cluster resource schedulers fall into two broad categories: some do fine-grained management of resources for individual frameworks (e.g., in Hadoop), but this requires multiple frameworks to run on multiple isolated … Continue Reading ››

Distributed in-memory datasets

Mosharaf — Mon, 31 Oct 2011 03:36:54 +0000

AMPLab, "Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing," UCB/EECS-2011-82, 2011. [PDF]

Russell Power, Jinyang Li, "Piccolo: Building Fast, Distributed Programs with Partitioned Tables," OSDI, 2010. [PDF]

Summary

MapReduce and similar frameworks, while widely applicable, are limited to directed acyclic data flow models, do not expose global states, and generally slow due … Continue Reading ››

Cloud databases

Mosharaf — Wed, 26 Oct 2011 04:44:41 +0000

MIT, "Relational Cloud: A Database-as-a-Service for the Cloud," CIDR, 2011. [PDF]

Divyakant Agrawal, Amr El Abbadi, Sudipto Das, Aaron J. Elmore, "Database Scalability, Elasticity, and Autonomy in the Cloud," DASFAA, 2011. [PDF]

Relational Cloud

The key idea of the Relational Cloud project is to define the concept of transactional Database-as-a-Service (DBaaS), identify the key challenges toward … Continue Reading ››

Declarative and finite state machine approaches to Cloud programming

Mosharaf — Sat, 22 Oct 2011 07:39:15 +0000

Perter Alvaro, Tyson Condie, Neil Conway, Khaled Elmeleegy, Joseph M. Hellerstein, Russell Sears, "BOOM Analytics: Exploring Data-Centric, Declarative Programming for the Cloud," EuroSys, 2010. [PDF]

Joe Armstrong, "Erlang: A Survey of the Language and Its Industrial Applications," Ninth Exhibition and Symposium on Industrial Applications of Prolog, 1996. [PDF]

BOOM

BOOM or Berkeley Orders-Of-Magnitude adopts a … Continue Reading ››

Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS

Mosharaf — Tue, 18 Oct 2011 04:16:57 +0000

Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen, "Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS," SOSP, 2011. [PDF] Summary This paper introduces a new consistency model, causal+, that extends the causal consistency model and lies between sequential and causal consistency models. The authors claim that causal+ is the … Continue reading →

PNUTS: Yahoo!’s Hosted Data Serving Platform

Mosharaf — Tue, 18 Oct 2011 04:09:17 +0000

Yahoo! Research, "PNUTS: Yahoo!’s Hosted Data Serving Platform," PVLDB, 2008. [PDF]

Summary

PNUTS is a scalable, highly available, and geographically distributed (but low latency) data store used by most Yahoo! online properties. To achieve both availability and partition tolerance, it uses a novel notion of consistency called per-record timeline consistency; under this model, all replicas of … Continue Reading ››

Data-parallel pipelines using high-level languages

Mosharaf — Fri, 14 Oct 2011 05:49:03 +0000

Microsoft, "DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language," OSDI, 2008. [PDF]

Google, "FlumeJava: Easy, Efficient Data-Parallel Pipelines," PLDI, 2010. [LINK]

Background

Data-parallel computing systems expose high-level abstractions to the users to reason about distributed computations, while handling low-level tasks of scheduling and automated fault-tolerance without any user input. At … Continue Reading ››

Dremel: Interactive Analysis of Web-Scale Datasets

Mosharaf — Tue, 11 Oct 2011 00:05:34 +0000

Google, "Dremel: Interactive Analysis of Web-Scale Datasets," VLDB, 2010. [PDF]

Summary

Dremel is Google's interactive ad hoc query system for analysis of read-only nested data. Unlike MapReduce, Dremel is aimed toward data exploration, monitoring, and debugging, where near real-time performance is of utmost importance. To achieve scalability and performance, Dremel builds upon three key ideas:

It … Continue Reading ››

Dynamo: Amazon’s Highly Available Key-value Store

Mosharaf — Fri, 07 Oct 2011 20:01:08 +0000

Amazon, "Dynamo: Amazon's Highly Available Key-value Store," SOSP, 2007. [PDF]

Summary

Dynamo is a highly available (99.9th percentile) key-value storage mechanism that sacrifices traditional consistency models for eventual consistency to achieve availability. Dynamo works with a simple query model, where read/write (get() and put()) operations are performed on data items uniquely identified by their keys. … Continue Reading ››

Bigtable: A Distributed Storage System for Structured Data

Mosharaf — Fri, 07 Oct 2011 06:59:09 +0000

Google, "Bigtable: A Distributed Storage System for Structured Data," OSDI, 2006. [PDF]

Summary

Bigtable is a large-scale (petabytes of data across thousands of machines) distributed storage system for managing structured data. It is built on top of several existing Google technology (e.g., GFS, Chubby, and Sawzal) and used by many of Google's online … Continue Reading ››

SCADS: Scale-Independent Storage for Social Computing Applications

Mosharaf — Wed, 05 Oct 2011 02:40:35 +0000

Michael Armbrust, Armando Fox, David A. Patterson, Nick Lanham, Beth Trushkowsky, Jesse Trutna, Haruki Oh, "SCADS: Scale-Independent Storage for Social Computing Applications," CIDR, 2009. [PDF] Summary SCADS (Scalable Consistency Adjustable Data Storage) is a proposal for a collection of components leveraging database, control theory, and machine learning techniques to achieve data scale independence for rapidly … Continue reading →

High-level platforms on top of Hadoop

Mosharaf — Mon, 03 Oct 2011 18:27:23 +0000

Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins, "Pig Latin: A Not-So-Foreign Language for Data Processing," SIGMOD, 2008. [PDF]

Facebook Data Team, "Hive: Data Warehousing and Analytics on Hadoop," . [LINK]

Summary

Pig and Hive are higher level programming interfaces to Hadoop with corresponding data management tools and related optimizations developed by … Continue Reading ››

Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks

Mosharaf — Sun, 02 Oct 2011 03:45:18 +0000

Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly, "Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks," EuroSys, 2007. [PDF] Summary Dryad is Microsoft's answer to the MapReduce paradigm, albeit at a (slightly) lower level with greater flexibility. Like MapReduce, Dryad allows developers to think about what to do with the data, and Dryad … Continue reading →

MapReduce: Simplified Data Processing on Large Clusters

Mosharaf — Wed, 28 Sep 2011 18:31:10 +0000

Jeffrey Dean, Sanjay Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," OSDI, 2004. [PDF]

Summary

MapReduce is a programming model and associated implementation for processing and generating large data sets in a parallel, fault-tolerant, distributed, and load-balanced manner. There are two main functions (both user provided) in this programming model. The map function takes an input … Continue Reading ››

Megastore: Providing Scalable, Highly Available Storage for Interactive Services

Mosharaf — Mon, 26 Sep 2011 22:39:06 +0000

Google, "Megastore: Providing Scalable, Highly Available Storage for Interactive Services," CIDR, 2011. [PDF] Summary Megastore is a highly available, scalable storage system built on top of Google's BigTable system for scalable storage and Chubby for locks and configuration data. It supports full ACID semantics and specially suited for interactive services, even though BigTable itself does … Continue reading →