Spark
Spark has been accepted at NSDI’2012
Our paper “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing” has been accepted at NSDI’2012. This is Matei‘s brainchild and a joint work of a lot of people including, but not limited to, TD, Ankur, Justin, Murphy, and professors Ion Stoica, Scott Shenker, and Michael Franklin. Unlike many other systems papers, Spark is ...
Continue reading →
Distributed in-memory datasets
AMPLab, “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing,” UCB/EECS-2011-82, 2011. [PDF] Russell Power, Jinyang Li, “Piccolo: Building Fast, Distributed Programs with Partitioned Tables,” OSDI, 2010. [PDF] Summary MapReduce and similar frameworks, while widely applicable, are limited to directed acyclic data flow models, do not expose global states, and generally slow due to ...
Continue reading →
Technical report on Spark is available Online
A technical report describing the key concepts behind Spark is available online. The abstract goes below: We present Resilient Distributed Datasets (RDDs), a distributed memory abstraction that allows programmers to perform in-memory computations on large clusters while retaining the fault tolerance of data flow models like MapReduce. RDDs are motivated by two types of applications ...
Continue reading →
Spark’s in the wild
We have been working on the Spark cluster computing framework for last couple of years. It has always been open source under the BSD license in github. But yesterday Matei declared official launch of the spark website (spark-project.org) and mailing lists along with its 0.2 release to everyone during the AMPLab summer retreat at Chaminede, ...
Continue reading →
Orchestra has been accepted at SIGCOMM’2011
Update: Camera-ready version of the paper should be can be found in the publications page very soon! Our paper “Managing Data Transfers in Computer Clusters with Orchestra” has been accepted at SIGCOMM’2011. This is a joint work with Matei, Justin, and professors Mike Jordan and Ion Stoica. The project started as part of Spark and ...
Continue reading →
Spark short paper has been accepted at HotCloud’10
An initial overview of our ongoing work on Spark, an iterative and interactive framework for cluster computing, has been accepted at HotCloud’10. I’ve been joined the project last February, while Matei has been working on it since last Fall. I will have uploaded the paper in the publications page. once we have taken care of ...
Continue reading →