Tag Archives: HDFS

Sinbad Merged with HDFS Trunk at Facebook!

Today marks the end of my stay at Facebook. Over the last eight months, on and off, I have re-implemented, tested, evaluated, reproduced the results, and merged Sinbad (SIGCOMM’13) with Facebook’s internal HDFS trunk. Several of the future work promised in the paper have been executed in the process, with varying results. It has been quite a fruitful stint for me; hopefully, it’ll be useful for Facebook in coming days as it rolls out to more production clusters. After all, who doesn’t like faster writes!

I’m really glad to have had this opportunity, and there are many people to thank. It all started with giving a talk last July at the end of my Facebook Fellowship, which resulted in Hairong asking me to implement Sinbad at Facebook. Every single member of the HDFS team have been extremely helpful since I started last October. I’d specially like to mention Hairong, Pritam, Dikang, Weiyan, Justin, Peter, Tom, and Eddie for spending many hours in getting me up to the speed, code reviewing, bug fixing, helping in deploying, troubleshooting, and generally baring with me. I’d also like to thank Lisa from the Networking team to help me add APIs to their infrastructure so that Sinbad can collect rack-level information.

The whole experience was different than my previous internships at research labs for several reasons. First, because it wasn’t an internship, I was on my own schedule and could get my research done while doing this on the side. Second, it was nice to sit with the actual team running some of the largest HDFS clusters on earth. I did learn a lot and found some new problems to explore in the future. Finally, of course, lots of free delicious food!

I’m glad I listened to Ion when I was asked by Hairong. Now, let’s hope Apache Hadoop/YARN picks it up as well.

Sinbad accepted to appear at SIGCOMM’2013

Update: Latest version is available here!

Our paper on leveraging flexibility in endpoint placement, aka Sinbad/Usher/Orchestrated File System (OFS), has been accepted for publication at this year’s SIGCOMM. We introduce constrained anycast in data-intensive clusters in the form of network-aware replica placement for cluster/distributed file systems. This is the second piece of my dissertation research, where the overarching goal is to push slightly more application semantics into the network to get disproportionately larger improvements. Sometimes things are unfair in a good way!

Many applications do not constraint the destinations of their network transfers. New opportunities emerge when such transfers contribute a large amount of network bytes. By choosing the endpoints to avoid congested links, completion times of these transfers as well as that of others without similar flexibility can be improved. In this paper, we focus on leveraging the flexibility in replica placement during writes to cluster file systems (CFSes), which account for almost half of the cross-rack traffic in data-intensive clusters. CFS writes only require placing replicas in multiple fault domains and a balanced use of storage, but the replicas can be in any subset of machines in the cluster.

We study CFS interactions with the cluster network, analyze optimizations for replica placement, and propose Sinbad – a system that identifies imbalance and adapts replica destinations to navigate around congested links. Sinbad does so with little impact on long-term storage load balance. Experiments on EC2 (trace-driven simulations) show that block writes complete 1.3x(1.58x) faster as the network becomes more balanced. As a collateral benefit, the end-to-end completion times of data-intensive jobs improve by up to 1.26x. Trace-driven simulations also show that Sinbad’s improvements (1.58x) is close to that of a loose upper bound (1.89x) of the optimal.

This is a joint work with Srikanth Kandula and Ion Stoica. One cool fact is that we collaborated with Srikanth almost entirely over Skype for almost a year! I did take up a lot of his time from his Summer interns, but most of them had their papers in too. There are a LOT of people to thank including Yuan Zhong, Matei Zaharia, Gautam Kumar, Dave Maltz, Ganesh Ananthanarayanan, Ali Ghodsi, Raj Jain, and everyone else I’m forgetting right now. It took quite some time to put our ideas crisply in few pages, but in the end the reception was great from the PC. We hope it will be useful in practice too.

This year the SIGCOMM PC accepted 38 papers out of 240 submissions with a 15.83% acceptance rate. This is the highest acceptance rate since 1996 (before Ion Stoica had any SIGCOMM paper) and the highest number of papers accepted since 1987 (before Scott Shenker had any paper in SIGCOMM)!