A Policy-aware Switching Layer for Data Centers

D. Joseph, A. Tavakoli, I. Stoica, “A Policy-Aware Switching Layer for Data Centers,” ACM SIGCOMM Conference, (August 2008). [PDF]

Summary

Middlebox (e.g., firewalls, load balancers etc.) management in existing data center networks is inflexible and require manual configuration by system administrators, which result in frequent misconfigurations. This paper presents a policy-aware switching layer (PLayer) in layer 2 that addresses data center middlebox deployment hardships using two design principles: “separate policy from reachability” and “take middleboxes off the physical paths” to provide correctness of traffic flow, flexibility of middlebox placement/configuration, and efficiency in minimizing unnecessary middlebox traversals by packets.

Basic Workflow

The PLayer consists of enhanced layer 2 switches (pswitches). Unmodified middleboxes are plugged into a pswitch in the same way servers are plugged into regular switches. Pswitches depend on network administrator specified policies to forward frames. Policies define the sequence of middleboxes to be traversed by different traffic and take the form: [Start location, Traffic Selector]→Sequence, where the left side defines the applicable traffic (frames using 5-tuples matching the Traffic Selector arriving from the Start Location) and the right side refer to the sequence of middlebox types to be traversed. Note that, multiple middleboxes of the same type is not supported by PLayer; hence middlebox type.

Policies are centrally stored in a policy controller and automatically converted to rules by PLayer after static analysis to discard detectably buggy policies. Rules are stored in pswitches’ rule tables and take the form: [Previous Hop, Traffic Selector] : Next Hop. The centralized middlebox controller also monitor middlebox liveliness and informs pswitches about middlebox churn. Whenever a pswitch receives a packet, it finds the appropriate rule from the rule table and forward the packet to the Next Hop as specified. Pswitches use flow-direction-agnostic consistent hash on a frame’s 5-tuple to select the same middlebox instance for all the frames in both forward and reverse directions of a flow.

Implementation Details

Forwarding infrastructure: A pswitch consists of two independent, decoupled parts: the Switch Core, which does the forwarding, and the Policy Core, which redirects frames to the middleboxes dictated by policies. The Policy Core itself consists of multiple modules: RuleTable, InP, OutP, and FailDetect. The Switch Core appears as a regular Ethernet switch to the Policy Core, while the Policy Core appears like a multi-interface device to the Switch core. Frames redirected by the Policy Core are encapsulated inside a Ethernet-II frame to preserve the original MAC address, which is required for correctness.
Supporting unmodified middleboxes: To support middlebox without any modification, pswitches
1. ensure that only relevant frames in standard Ethernet format reach middleboxes and servers,
2. use non-intrusive techniques (e.g., stateless in-line device in the worst case) to identify the previous hop, and
3. support a wide variety of middlebox addressing requirements.
Supporting non-transparent middleboxes/Stateful pswitches: In order to support middleboxes that change frame headers or content (e.g., load balancers), the concept of per-segment policies is introduced. Middlebox instance selection is supported by consistent hashing and policy hints, and require stateful pswitches in more complex scenarios. A stateful pswitch keeps track of forward and reverse flows to select the same middleboxes in both directions even if frame header is modified.
Churn behavior: PLayer can face different kinds of churn: network, policy, and middlebox. PLayer employs 2-stage, versioned policy and middlebox information dissemination mechanism, intermediate middlebox types, and stateful pswitches to handle different scenarios. The authors show that no matter what, policies are never violated.

Performance Evaluation and Analysis

PLayer was evaluated on DETER testbed using a handful of regular machines, and the evaluation mostly focus on showing that it works. However, such small scale experiment fail to capture or judge its effectiveness in large data center scenarios. Microbenchmarks show 18% decrease in TCP throughput and 16% increase in latency for basic pswitches and 60% drop in throughput and 100% increase in latency for middlebox offloading. Most the time is spent on rule lookup and frame encapsulation/decapsulation procedures.

The authors have also provided a semi-formal analysis of PLayer to show guaranteed correctness and consistency at the expense of availability.

Critique

While interesting this paper suffers from severe performance setbacks; specially, the high increase in latency will most likely make deployment impossible. Even though the authors argue that inside data centers latency introduced by offloading would be negligible, without real data it is really hard to rely on that assurance.

As the authors point out, there are several limitations in terms of indirect paths, policy specification mechanisms, effects of packet misclassification, incorrect physical wiring of middleboxes, unsupported policies, and more complex switches. However, the prospect of manageability gains is promising enough to consider those challenges in the future.

Several special cases rely on stateful pswitches, which are not very practical at least in the existing networking landscape.

One last thing, there are a lot of similarities between PLayer and OpenFlow from a high level view. I wonder whether OpenFlow (which appeared after PLayer) was influenced by PLayer and how much.

Mosharaf Chowdhury