Datacenter transport layer protocols

Stanford and Microsoft, “DCTCP: Efficient Packet Transport for the Commoditized Data Center,” SIGCOMM, 2010. [PDF]

Raiciu et al, “Improving Datacenter Performance and Robustness with Multipath TCP,” SIGCOMM, 2011. [PDF]

MSR Asia, ICTCP: Incast Congestion Control for TCP in Data Center Networks,” CoNEXT, 2010. [PDF]

Summary

Datacenters pose a different set of challenges than the Internet, such as microsecond RTTs, synchronized workloads that cause incast, and decreased level of multiplexing. TCP, as we know it with milliseconds feedback loops and dependence on packet drops for congestion, works mostly alright but leaves one wondering whether we could design a better transport protocol. DCTCP, MPTCP, and ICTCP are three recent proposals that try to address this question. The proliferation of such proposals stems from the unique opportunities that only a datacenter network can provide, e.g., complete knowledge of the topology and workloads, single administrative domain that allows enforcing changes to the network elements, and uniform network behavior almost all over the network. Each of the three protocols summarized below exploits one or more datacenter-specific network characteristics.

DCTCP

DCTCP aims for smaller occupancy in switch buffers through explicit rate throttling at end hosts in order to ensure low latency for short flows and high throughput for long flows. Switches set ECN bits to signal the senders to cut back their window sizes, while the senders estimate the level of congestion and reduce their window sizes proportionally (as opposed to multiplicative decrease in TCP).

ICTCP

ICTCP is a specialized TCP variation to solve the incast problem in the last hop. The key idea is to adjust the receiving window of each connection by estimating the available bandwidth.

MPTCP

MPTCP is kind of orthogonal to DCTCP and ICTCP in that it tries to address the problem of underutilization of bisection bandwidth and relevant unfairness when flows follow only a single path. Their solution is, unsurprisingly, using multiple paths. Transparent to the applications, MPTCP divides each source-destination flow into several sub-flows and employs a congestion control mechanism that pushes toward using up as much available bandwidth as possible.

Comments

Out of the three, I found DCTCP and MPTCP more interesting because of the breadth of problems they try to solve. ICTCP is geared toward solving the incast problem only; however, one thing I found interesting about it is that it takes a flow control approach to the problem instead of the more common congestion control approach. In general, all three suffer from the this-may-not-be-not-real syndrome: DCTCP and ICTCP are possibly too biased by Microsoft workloads, while MCTCP has no evaluation on real workloads. It will be nice to see more general evaluation of all three. Also, my personal opinion on the order of long-run impacts of these papers is MPTCP>DCTCP>>ICTCP.

One thought on “Datacenter transport layer protocols”

Leave a Reply

Your email address will not be published. Required fields are marked *