Tag Archives: Spark

Apache Spark Receives the 2022 ACM SIGMOD Systems Award

May 13, 2022 Mosharaf

Congratulations to the whole Spark community on the prestigious award and its 1800+ contributors and innumerable users. As I look back, memories from years ago continue to remind me how fortunate I was to be able to make small contributions at the infancy of this juggernaut.

Recent News

Orchestra is the Default Broadcast Mechanism in Apache Spark

September 22, 2014 Mosharaf

With its recent release, Apache Spark has promoted Cornet—the BitTorrent-like broadcast mechanism proposed in Orchestra (SIGCOMM'11)—to become its default broadcast mechanism. It's great to see our research see the light of the real-world! Many thanks to Reynold and others for making it happen.

MLlib, the machine learning library of Spark, will enjoy the biggest boost from this change because of the broadcast-heavy nature of … Continue Reading ››

Recent News

Spark wins the Best Paper Award at NSDI’2012

April 25, 2012 Mosharaf 1 Comment

Spark (Resilient Distributed Datasets/RDDs) has won the Best Paper award at NSDI 2012. Woohoo! We were also nominated for the inaugural Community Award for open-sourcing the project.

Recent News

Spark has been accepted at NSDI’2012

December 13, 2011 Mosharaf Leave a comment

Our paper "Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing" has been accepted at NSDI'2012. This is Matei's brainchild and a joint work of a lot of people including, but not limited to, TD, Ankur, Justin, Murphy, and professors Ion Stoica, Scott Shenker, and Michael Franklin. Unlike many other systems papers, Spark is … Continue Reading ››

Reviews

Distributed in-memory datasets

October 30, 2011 Mosharaf Leave a comment

AMPLab, "Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing," UCB/EECS-2011-82, 2011. [PDF]

Russell Power, Jinyang Li, "Piccolo: Building Fast, Distributed Programs with Partitioned Tables," OSDI, 2010. [PDF]

Summary

MapReduce and similar frameworks, while widely applicable, are limited to directed acyclic data flow models, do not expose global states, and generally slow due … Continue Reading ››

Recent News

Technical report on Spark is available Online

July 26, 2011 Mosharaf Leave a comment

A technical report describing the key concepts behind Spark is available online. The abstract goes below:

We present Resilient Distributed Datasets (RDDs), a distributed memory abstraction that allows programmers to perform in-memory computations on large clusters while retaining the fault tolerance of data flow models like MapReduce. RDDs are motivated by two types of applications … Continue Reading ››

Recent News

Spark’s in the wild

May 26, 2011 Mosharaf 1 Comment

We have been working on the Spark cluster computing framework for last couple of years. It has always been open source under the BSD license in github. But yesterday Matei declared official launch of the spark website (spark-project.org) and mailing lists along with its 0.2 release to everyone during the AMPLab summer retreat … Continue Reading ››

Recent News

Orchestra has been accepted at SIGCOMM’2011

April 29, 2011 Mosharaf 6 Comments

Update: Camera-ready version of the paper ~~should be~~ can be found in the publications page ~~very soon~~!

Our paper "Managing Data Transfers in Computer Clusters with Orchestra" has been accepted at SIGCOMM'2011. This is a joint work with Matei, Justin, and professors Mike Jordan and Ion Stoica. The project started as part of Continue Reading ››

Recent News

Spark short paper has been accepted at HotCloud’10

May 8, 2010 Mosharaf 1 Comment

An initial overview of our ongoing work on Spark, an iterative and interactive framework for cluster computing, has been accepted at HotCloud'10. I've been joined the project last February, while Matei has been working on it since last Fall. I will have uploaded the paper in the publications page. once … Continue Reading ››