Tag Archives: routing

XORs in the Air: Practical Wireless Network Coding

S. Katti, H. Rahul, W. Hu, D. Katabi, M. Medard, J. Crowcroft, “XORs in the Air: Practical Wireless Network Coding,” ACM SIGCOMM Conference, (September 2006). [PDF]

Summary

This paper present COPE, a new forwarding architecture for wireless mesh networks that significantly improves the overall throughput by exploiting the shared nature of wireless medium using network coding. All the nodes communicate using broadcast in COPE. Each node stores overheard packets from its neighbors in a buffer pool for a short amount of time and also tells its neighbors what packets it has in its own pool using a bitmap. When a node transmits any packet, it uses the knowledge about its neighbor (if it knows) or intelligently guesses to XOR multiple packets together and transmits them as a single packet. Receiving node in the next hop, decodes the packet by re-XORing it with the packets in its pool. Packets are encoded only if they can be extracted in the next hop.

The authors explain through examples and prove theorems to show Coding gain and Coding+MAC gain (gain due to resource allocation less skewed toward hub-type nodes) of using COPE over traditional options. They have three results:

  1. In the absence of opportunistic listening, COPE’s maximum coding gain is 2, and it is achievable;
  2. In the absence of opportunistic listening, COPE’s maximum Coding+MAC gain is 2, and it is achievable; and
  3. In the presence of opportunistic listening, COPE’s maximum coding gain is unbounded.

There are several implementation related gotchas:

  • Packets are never delayed to avoid unnecessary queue development.
  • COPE prefers XORing packets of same length and keeps multiple virtual queues per neighbor based one for each packet size!
  • It also tries to avoid TCP packet reordering and use hop-by-hop ACKs and retransmissions.
  • While guessing whether or not to XOR one more packet to the mix, COPE calculates the probability based on some metric (ETX/ETT), and if the probability goes below a threshold (0.8 in the paper), it does not XOR that packet.
  • Moreover, COPE implements a pseudo-broadcast consisting of multiple unicasts to work-around default broadcasts poor reliability and lack of backoff.

COPE was evaluated on a 20 node 802.11a wireless testbed with a bit rate of 6 Mbps. Main results are:

  • For toy topologies, TCP flows get lower (no MAC gain) boost than UDP flows.
  • For a congested wireless medium, UDP flows gets 3-4x boost in overall throughput and approaches Coding+MAC gain. But with increasing load, UDP decreases and seems to have been stabilizing around 2x range.
  • But TCP suffers from hidden-terminal induced collision related backoff and cannot gain much using COPE. Without any hidden terminal, the maximum goodput shows 38% improvement.
  • Almost half the coded packets are XOR of only 2 packets and trailing off at a maximum of 5 packets.
  • Finally, if COPE based access network is connected to the Internet through a gateway, throughput gain varies from 5% to 70% based on uplink to downlink ratio.

Comments

This is an overall excellent paper that combines theory with practice and backs up the claims with wide range of experimental evaluation. The paper is also very well written and easy to follow. However, there are several issues that could use more explanation. COPE requires as many virtual queues as there are packet sizes (which they say essentially to be 2 for the Internet) for each neighbor within its hearing distance, and will have to look through all of them to find optimal XOR strategy. The paper does not provide any evaluation of how big they grow or how long its takes to optimally pick an XORing strategy.

The authors say that the pseudo-broadcast mechanism will make sure that every node gets more than once chance to overhear. However, they avoid to mention and evaluate that this might also result in a higher number of collisions and link layer retransmits.

ExOR: Opportunistic Multi-Hop Routing for Wireless Networks

S. Biswas, R. Morris, “ExOR: Opportunistic Multi-Hop Routing for Wireless Networks,” ACM SIGCOMM Conference, (August 2005). [PDF]

Summary

ExOR is an integrated routing and MAC technique that opportunistically exploits the broadcast characteristics of the wireless environment. Unlike the traditional routing protocols, where the best sequence of nodes from a source to a destination is decided before packets are routed along that path, ExOR defers the next hop decision until it knows all the nodes that received a packet and allows the closest node to the destination to be the next hop and retransmit that packet toward destination. As a result, ExOR can try multiple long but lossy links concurrently, resulting in high expected progress per transmission.

Packets are sent in batches. The source node includes in each packet a list of candidate forwarders prioritized by closeness to destination (ETX with only forward delivery probability is used in this paper) and a batch map that the sender’s best guess of the highest priority node to have received each packet. Receiving nodes buffer packets until the end of batch transmission. Then based on priority, each receiver broadcasts its received packets with updated batch maps. If a packet is somehow overheard by a node farther away than expected, it replaces the original guess with its own id as the highest priority receiver in the batch map. Essentially, each node end up with more or less the same batch map using this gossip-like protocol. Every forwarder transmits only those packets that are not ACKed by higher priority nodes. This cycle continues until 90% packets have reached the destination; the rest are transmitted using traditional routing protocol.

Evaluation of ExOR was performed on a 32 node testbed, and the authors claim that ExOR performed better than traditional routing for almost all node pairs with an average 2x end-to-end throughput gain. ExOR gain is more prominent in longer lower throughput routes (due to more probability of exploiting overhearing) than shorter higher throughput routes, but it manages to lead traditional routing throughout the spectrum. The authors believe that the best batch size is somewhere between 10 to 100 packets, but do not provide any specific value for magic formula.

Two of assumptions behind ExOR are that reception at different nodes is independent, and that there is a gradual falloff in delivery probability with distance. Through simulation, the authors show that with longer routes dependent loss more and more lags independent loss; however, maximum lag found in the simulation is 20%, which made the authors postulate that independent loss is not really a requirement but can be exploited if present.

Critique

The concept is simple and straightforward, but the paper somehow managed to make it more complex than it really is. Figure 6 as well as its description in Section 3.8 only confused everything more. And those massive plots with minute information only made things worse (LaTeX put them in an unordered fashion, which made reading the paper all the more annoying).The evaluation also did not reveal much except for showing that ExOR does increase throughput in their testbed. In addition, it also interacts badly with TCP and resorts to split connections breaking the end-to-end TCP semantics and possibly introducing overhead.

Interdomain Internet Routing

H. Balakrishnan, “Interdomain Internet Routing,” MIT Lecture Notes, (January 2009).

Summary

Connectivity between different autonomous systems (ASes) with heterogeneous characteristics was the primary objective while designing the Internet. Interdomain routing enables end-to-end communication and global connectivity in a distributed manner through the dissemination of reachability information among the ASes/ISPs in the Internet. This lecture provides an overview of the existing interdomain routing scenario; its components and relationships between them; currently used interdomain routing protocol (BGP4); and underlying mechanisms as well as economic activities that dictate its workflow.

The Internet is a connected graph with ASes (uniquely identifiable and managed by different ISPs) as its nodes and interdomain links (between border gateways of adjacent ASes) as its edges. Each AS has a range of IP addresses that it advertises (after aggregation) to its neighbors along with the reachability information it received from its other neighbors. In steady state, each AS ends up with a global view where it knows which AS a packet to forward to for a particular destination prefix. In practice, however, ISPs have different sizes and reachability, which results in a ISP-hierarchy with different relationships (e.g., peer-to-peer, provider-customer) along the hierarchy. Moreover, there is competitions and economic tensions between connected ISPs. This lecture discusses different ISP relationships (e.g., peering, transits) and related concepts and mechanisms (e.g., ranking, filtering) from an economic standpoint.

The main objective of an interdomain routing protocol is to provide reachability in a scalable manner without violating the autonomy (in terms of policies) of the concerned ISPs. In the existing Internet, BGP (a path-vector protocol) performs the functionalities to provide scalable, policy-based routing under competitive circumstances. As pointed out in the lecture, BGP lacks security and trust and it is prone to misconfigurations, implementation and human errors ; but in terms of connectivity BGP is quite resilient and fault-tolerant. One thing should be noted here is that the first BGP draft came out in 1989, one year after the DARPA design paper, and security was not the biggest concern at that time.

BGP comes in two flavors: eBGP connects border routers from different ISPs, while iBGP internally connects the border routers of a single ISP in a full mesh to share reachability information. In a larger ISP, iBGP faces some scalability issues; techniques to subside them (e.g., route reflecting, hierarchical organization) are addressed in the lecture. In addition, BGP attributes and their functionalities in route selection, filtering, and path ranking are also discussed. Finally, a brief overview of the biggest BGP challenges (e.g., prefix hijacking, convergence and multi-homing issues) concludes the discussion.

Critique

Most of the policies in an interdomain routing protocol are driven by economic objectives – be it route selection, exporting reachability information, or traffic engineering. The best thing about this lecture is that it drives the discussion from an economic perspective instead of diving deep into the technical details of BGP internal mechanisms.

One of the fascinating things about BGP is that even though each ISP is a black-box to the rest of the world and there is little trust between the ISPs, BGP is still working and doing pretty well at that. Consequently, a significant literature in game theory and mechanism design have been developed to try to understand and model the behavior and incentives of ISPs in the Internet. I am not really informed of their success though!

Security concerns and ample opportunities for human errors are the two weakest points in BGP design and implementation. Even though S-BGP is out there to add some security, mostly due to bureaucratic reasons its not deployed. Its unfortunate but true for any changes to the core of the Internet. However, why no expressive/declarative, type-safe language for BGP configuration has not yet been created by any vendor is unclear. If they keep the underlying mechanism same but provide a safer human-interface that might be able to cut a large number of misconfiguration errors.

To address various perceived deficiencies of BGP, tens (may be even hundreds) of interdomain routing protocols have appeared over the years without any success. Does it prove BGP’s success or is everyone involved just too reluctant to spent time, money, and effort to replace BGP?

Understanding BGP Misconfiguration

R. Mahajan, D. Wetherall, T. Anderson, “Understanding BGP Misconfiguration,” ACM SIGCOMM Conference, (August 2002).

Summary

Misconfigurations in BGP tables result in excessive routing load, connectivity disruption, and policy violation. However, misconfigurations are widely prevalent in the Internet without any systematic study of their characteristics. This paper presents an empirical study of the BGP misconfigurations with an aim to find out their frequencies and root causes behind them. The authors analyzed routing table advertisements over a three-week period from 23 vantage points using the RouteViews listener to detect BGP misconfigurations and separately polled the concerned ISPs to verify the collected data with the polled ones. Their final verdict is that the perceived end-to-end connectivity the Internet is surprisingly robust! (Does it sound familiar?) Apart from that, there are several numerical results, most of them providing conservative lower bounds on different configuration errors. They also propose several fixes to abate the sources of misconfigurations.

The authors consider two globally visible BGP misconfigurations (misconfigurations being inadvertent slips and design mistakes):

  1. Origin misconfiguration: Accidental injection of prefix in global tables due to (i) aggregation errors in rules, (ii) hijacks, and (iii) leakage of private routes. Causes of origin misconfigurations include: (i) self deaggregation, (ii) related origin, and (iii) foreign origin.
  2. Export misconfiguration: AS-paths in violation of policies of one of the ASes in the path.

The basic assumption behind used for declaring a prefix to be an artifact of error is that misconfigurations appear and disappear really quickly while policy changes happen in human timescale. To be exact, the authors only consider short-lived (less than a day) changes as candidates for misconfigurations, and double-check their observations through email surveys of the concerned ISPs.

From their empirical study the authors discovered several intriguing results:

  • Origin misconfigurations cause 13% incident, but result in only 4% connectivity issues (i.e., disrupting connectivity is not easy). Moreover, 95% of the incidents are corrected within less than 10 hours and 95% of connectivity issues are resolved in less than an hour!
  • Export misconfigurations do not cause connectivity issues directly, but they bring in extra traffic to cause congestion leading to negligible indirect connectivity errors.
  • Overall, 75% of prefix announcements are due to misconfigurations in one way or another.
  • Apart from human error, which they set out to prove in the first place, software bugs play a significant role in BGP issues.

Based on email survey results, this paper identifies several sources of both types of misconfigurations (e.g., bugs, filtering error, stale configuration, hijacking, bad ACL etc.). And to check these source, the authors also propose solutions such as UI improvement, high-level language design, database consistency, deployment of secure protocols (e.g., S-BGP) etc.

Critique

One thing noticeable here is that they do not propose any automated monitoring and identifying system that could make life easier for the network operators. May be the results presented here will provide insight into solving some of the issues, but manual methods will probably not be helpful in identifying unknown phenomena. (Just saw that Neil added some links to ML-based studies possibly for automated identification)

Somewhere it is said that temporary route injection could be done to fend off DDoS attacks. Better explanation during discussion might help in that respect. The also authors remark that only a few BGP features are responsible for most of the errors. (Does it follow the Pareto principle?)  But they do not particularly point out any of the features that are actually responsible. Is it even possible to figure those out? One more thing mentioned in the paper is that problem with silent failures. How do we make sure that routers do not fail silently? Do we use heartbeat protocols or just dump every action for later inspection?

Not much to say about the status of the inconsistent databases throughout the Internet and inability to incorporate extensions (e.g., S-BGP) into the network. Somethings never change.

Feedback

While this paper aims to untangle mysteries behind BGP misconfigurations, in its heart its an excellent network measurement paper. The way the authors arrange the bits and pieces to build up the whole paper is incredible. (An aside: Ratul was also involved in the RocketFuel paper, another excellent network measurement paper, that got the best student paper award in SIGCOMM 2002.)