All posts by Mosharaf

Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks

D-M Chiu, R. Jain, “Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks,” Computer Networks and ISDN Systems, 17 (1989), pp 1-14. [PDF]

Summary

“Low delay” and “high throughput”, two main operating objectives of a computer network, are frequently hampered by congestion of packets. Once there is a congestion, it takes considerable amount of time and network resource to get back to equilibrium (in this case, to not-congested state is probably more proper). Congestion avoidance is a proactive measure that strives for avoiding a congestion just before it occurs, thus improving network utilization and performance. This paper presents mathematical analyses of several existing increase/decrease linear congestion avoidance algorithms and compare them using multiple performance metrics in a simplified network model.

Model

The authors consider variations of decentralized increase/decrease congestion avoidance algorithms each using “binary feedback scheme”. In a binary feedback control mechanism, network resources send back a one for overloaded state and a zero otherwise to the controllers. Basic assumptions of their model are:

  1. Users sharing the same bottleneck will receive the same feedback,
  2. Feedback and control loop for all users are synchronous (i.e., there take actions in lockstep), and
  3. There is only one bottleneck resource in the network.

Once users/end hosts receive feedback from the congested network resource, they take measures based on the following criteria:

  1. Efficiency: Measured in terms of the utilization of resource; both overload and underload are undesirable.
  2. Fairness: Each user should take equal (proportional!) share of the bottleneck. Properties of a suitable fairness function can be found in the paper.
  3. Distributedness: Control requires minimum amount of feedback from the resource under consideration. No central processing is required.
  4. Convergence: The control scheme should converge within a reasonable level of tolerance.

Results

Through mathematical reasoning and geometrical representation, the authors compare AIAD, AIMD, MIAD, and MIMD (A: additive, D: decrease, I: increase, M: multiplicative) control schemes and prove that only AIMD can optimally converge to both efficiency and fairness in a distributed manner with minimal feedback. They also find out the operating region and control parameters for the control scheme.

Non-linear controls are not considered in this paper citing the sake of practicality and modeling issues. While the authors do agree that non-linear control gives more flexibility, they argue that too much sensitivity to changing conditions in non-linear schemes can result in instability.

Critique

At the end of the paper, the authors themselves raise several questions on different possible scenario that can affect AIMD in multiple ways; e.g., how does delayed feedback affect the control, what is the marginal utility of increased bits of feedback, is it worthwhile to guess the current number of users, and what is the impact of asynchronous operation. However, they left all of them as future work! I wonder what are the correct (if any) answers to these questions.

The paper itself is a wonderful read with fairly easy, understandable math with explanatory geometric representations. However, it is not clear how the authors came up with that massive logarithmic expression while calculating the conditions for optimal convergence to efficiency.

As far as I understand, CUBIC TCP for high-speed network uses non-linear model (cubic) function for congestion avoidance mechanism. Discussion/info on other variations of non-linear models is very much welcome.

Interdomain Internet Routing

H. Balakrishnan, “Interdomain Internet Routing,” MIT Lecture Notes, (January 2009).

Summary

Connectivity between different autonomous systems (ASes) with heterogeneous characteristics was the primary objective while designing the Internet. Interdomain routing enables end-to-end communication and global connectivity in a distributed manner through the dissemination of reachability information among the ASes/ISPs in the Internet. This lecture provides an overview of the existing interdomain routing scenario; its components and relationships between them; currently used interdomain routing protocol (BGP4); and underlying mechanisms as well as economic activities that dictate its workflow.

The Internet is a connected graph with ASes (uniquely identifiable and managed by different ISPs) as its nodes and interdomain links (between border gateways of adjacent ASes) as its edges. Each AS has a range of IP addresses that it advertises (after aggregation) to its neighbors along with the reachability information it received from its other neighbors. In steady state, each AS ends up with a global view where it knows which AS a packet to forward to for a particular destination prefix. In practice, however, ISPs have different sizes and reachability, which results in a ISP-hierarchy with different relationships (e.g., peer-to-peer, provider-customer) along the hierarchy. Moreover, there is competitions and economic tensions between connected ISPs. This lecture discusses different ISP relationships (e.g., peering, transits) and related concepts and mechanisms (e.g., ranking, filtering) from an economic standpoint.

The main objective of an interdomain routing protocol is to provide reachability in a scalable manner without violating the autonomy (in terms of policies) of the concerned ISPs. In the existing Internet, BGP (a path-vector protocol) performs the functionalities to provide scalable, policy-based routing under competitive circumstances. As pointed out in the lecture, BGP lacks security and trust and it is prone to misconfigurations, implementation and human errors ; but in terms of connectivity BGP is quite resilient and fault-tolerant. One thing should be noted here is that the first BGP draft came out in 1989, one year after the DARPA design paper, and security was not the biggest concern at that time.

BGP comes in two flavors: eBGP connects border routers from different ISPs, while iBGP internally connects the border routers of a single ISP in a full mesh to share reachability information. In a larger ISP, iBGP faces some scalability issues; techniques to subside them (e.g., route reflecting, hierarchical organization) are addressed in the lecture. In addition, BGP attributes and their functionalities in route selection, filtering, and path ranking are also discussed. Finally, a brief overview of the biggest BGP challenges (e.g., prefix hijacking, convergence and multi-homing issues) concludes the discussion.

Critique

Most of the policies in an interdomain routing protocol are driven by economic objectives – be it route selection, exporting reachability information, or traffic engineering. The best thing about this lecture is that it drives the discussion from an economic perspective instead of diving deep into the technical details of BGP internal mechanisms.

One of the fascinating things about BGP is that even though each ISP is a black-box to the rest of the world and there is little trust between the ISPs, BGP is still working and doing pretty well at that. Consequently, a significant literature in game theory and mechanism design have been developed to try to understand and model the behavior and incentives of ISPs in the Internet. I am not really informed of their success though!

Security concerns and ample opportunities for human errors are the two weakest points in BGP design and implementation. Even though S-BGP is out there to add some security, mostly due to bureaucratic reasons its not deployed. Its unfortunate but true for any changes to the core of the Internet. However, why no expressive/declarative, type-safe language for BGP configuration has not yet been created by any vendor is unclear. If they keep the underlying mechanism same but provide a safer human-interface that might be able to cut a large number of misconfiguration errors.

To address various perceived deficiencies of BGP, tens (may be even hundreds) of interdomain routing protocols have appeared over the years without any success. Does it prove BGP’s success or is everyone involved just too reluctant to spent time, money, and effort to replace BGP?

Understanding BGP Misconfiguration

R. Mahajan, D. Wetherall, T. Anderson, “Understanding BGP Misconfiguration,” ACM SIGCOMM Conference, (August 2002).

Summary

Misconfigurations in BGP tables result in excessive routing load, connectivity disruption, and policy violation. However, misconfigurations are widely prevalent in the Internet without any systematic study of their characteristics. This paper presents an empirical study of the BGP misconfigurations with an aim to find out their frequencies and root causes behind them. The authors analyzed routing table advertisements over a three-week period from 23 vantage points using the RouteViews listener to detect BGP misconfigurations and separately polled the concerned ISPs to verify the collected data with the polled ones. Their final verdict is that the perceived end-to-end connectivity the Internet is surprisingly robust! (Does it sound familiar?) Apart from that, there are several numerical results, most of them providing conservative lower bounds on different configuration errors. They also propose several fixes to abate the sources of misconfigurations.

The authors consider two globally visible BGP misconfigurations (misconfigurations being inadvertent slips and design mistakes):

  1. Origin misconfiguration: Accidental injection of prefix in global tables due to (i) aggregation errors in rules, (ii) hijacks, and (iii) leakage of private routes. Causes of origin misconfigurations include: (i) self deaggregation, (ii) related origin, and (iii) foreign origin.
  2. Export misconfiguration: AS-paths in violation of policies of one of the ASes in the path.

The basic assumption behind used for declaring a prefix to be an artifact of error is that misconfigurations appear and disappear really quickly while policy changes happen in human timescale. To be exact, the authors only consider short-lived (less than a day) changes as candidates for misconfigurations, and double-check their observations through email surveys of the concerned ISPs.

From their empirical study the authors discovered several intriguing results:

  • Origin misconfigurations cause 13% incident, but result in only 4% connectivity issues (i.e., disrupting connectivity is not easy). Moreover, 95% of the incidents are corrected within less than 10 hours and 95% of connectivity issues are resolved in less than an hour!
  • Export misconfigurations do not cause connectivity issues directly, but they bring in extra traffic to cause congestion leading to negligible indirect connectivity errors.
  • Overall, 75% of prefix announcements are due to misconfigurations in one way or another.
  • Apart from human error, which they set out to prove in the first place, software bugs play a significant role in BGP issues.

Based on email survey results, this paper identifies several sources of both types of misconfigurations (e.g., bugs, filtering error, stale configuration, hijacking, bad ACL etc.). And to check these source, the authors also propose solutions such as UI improvement, high-level language design, database consistency, deployment of secure protocols (e.g., S-BGP) etc.

Critique

One thing noticeable here is that they do not propose any automated monitoring and identifying system that could make life easier for the network operators. May be the results presented here will provide insight into solving some of the issues, but manual methods will probably not be helpful in identifying unknown phenomena. (Just saw that Neil added some links to ML-based studies possibly for automated identification)

Somewhere it is said that temporary route injection could be done to fend off DDoS attacks. Better explanation during discussion might help in that respect. The also authors remark that only a few BGP features are responsible for most of the errors. (Does it follow the Pareto principle?)  But they do not particularly point out any of the features that are actually responsible. Is it even possible to figure those out? One more thing mentioned in the paper is that problem with silent failures. How do we make sure that routers do not fail silently? Do we use heartbeat protocols or just dump every action for later inspection?

Not much to say about the status of the inconsistent databases throughout the Internet and inability to incorporate extensions (e.g., S-BGP) into the network. Somethings never change.

Feedback

While this paper aims to untangle mysteries behind BGP misconfigurations, in its heart its an excellent network measurement paper. The way the authors arrange the bits and pieces to build up the whole paper is incredible. (An aside: Ratul was also involved in the RocketFuel paper, another excellent network measurement paper, that got the best student paper award in SIGCOMM 2002.)

The Design Philosophy of the DARPA Internet Protocols

Citation: D. D. Clark, “The Design Philosophy of the DARPA Internet Protocols,” ACM SIGCOMM Conference, (August 1988). [ACM]

Summary

This paper summarizes the rationale behind the design choices made for the Internet (initiated by DARPA) and explains how the Internet, its protocols, and its underlying mechanisms ended up being the way they were in 1988. The main objective of this paper is to provide necessary context and background for designing extensions to the Internet. In addition, the authors also discuss some of the evolutionary changes to the Internet architecture up to that point and the motivations/requirements behind those changes.

As pointed out in this paper, the primary goal of the DARPA Internet design was to interconnect its wired and radio networks, which can be generalized as the problem of interconnecting separately administered network domains, through multiplexing of underlying resources. In this respect, the DARPA designers used packet-switched store-and-forward networking technology because of their greater understanding of how it works and also due to the frequently-used applications/tasks that would’ve been benefited from this choice. The choice of unifying heterogeneous administrative domains through gateways instead of trying to create a monolithic mega-network turned out to be the biggest factor that allowed the Internet to scale so much in later years. Apart from the basic goal of interconnection, secondary objectives included survivability, heterogeneity in terms of technology and equipments, distributed management, and cost-effectiveness among others. What stands out in this list is the absence of security completely as well as the position of accountability/attribution as a low-priority objective. The take-home advice is that a system, in many cases, is highly coupled with and dictated by its requirements and premises; a list out of commercial interest could easily put accountability as one of the topmost goals. However, it is not clear why a military network was not concerned about security, privacy, authentication etc.!

Critique

One thing prominent in this paper is the influence/presence of the end-to-end argument on/in the design of the Internet – TCP/IP separation, fate-sharing, layering mechanism, UDP – all these concepts are direct or indirect results of simplifying the core. One must notice that the end-to-end argument was not applied to come up with those concepts, rather the way those concepts were materialized naturally points to the principle of simple flexible core and comparatively complex endpoints, i.e., the end-to-end argument (in Sections 5 and 6). The discussions on the evolutionary changes to TCP acknowledgement scheme, choosing between byte-vs-packet numbering, EOL and PSH-related issues provide retrospective views on the choices made, their implications, and what could’ve been done better. The part on EOL and PSH is not very clear though.

The paper also describes some of the problems/deficiencies/complexities that arose from the design choices, inabilities to meet some of the goals, and eventually brings up the issue of trade-off  that is ever-present in any system design. In this case, the trade-off is between simplifying the core versus putting too much complexities in the endpoints that might hinder performance. The question of where the endpoints really are in the end-to-end argument is completely exposed here without any definitive answer.

End-To-End Arguments in System Design

Citation: J. H. Saltzer, D. P. Reeed, D. D. Clark, “End-to-End Arguments in System Design,” 2nd International Conference on Distributed Computing Systems, Paris, (April 1981), pp. 509-512. [PDF]

Summary

The crux of the end-to-end argument is that in most cases endpoints (e.g., applications) know best what they need, and the underlying system should provide only a minimal set of attributes/functionalities that are absolutely necessary for it to support those applications. The system must not force anything that may or may not be useful to every single application and should avoid redundancy and cost inefficiency. The authors argue that simplicity is the key to successful system designing; low level mechanisms are only justified for performance enhancements, not for core design. They explain, through several use cases, that the flexibility resulting from the minimalistic design enables modular/layered system architectures that have empirically been found to be the key to success for many well-known distributed systems.

TCP/IP and possibly the whole network stack is the biggest example of applying the end-to-end argument, where each layer has defined functionalities, and one layer builds on top of another growing upto the application layer where actual applications have the last words. Several other examples in the areas of computer architecture (e.g., RISC), database systems are also presented in this paper that successfully maximized on this mantra.

Critique

It is not always clear where such endpoints lie. In a network, for example, OSes can be the endpoints as well as the applications. Throughout the paper, the authors keep suggesting that system designers must make intelligent and informed decisions to identify the endpoints in a system without any definite answer or rule-of-thumb. The end-to-end argument can at best be used as a guideline (or warning!) against pushing mechanisms too far down a layered system, but there is no definite way of telling that you have moved far enough.

As a result, over the years, the Internet has accumulated many functionalities/mechanisms in its core that can be considered to have gone against the end-to-end argument. Most of them, e.g., NATs, firewalls, are engineering fixes for particular problems or administrative/management requirements. It’s not clear whether this is a flaw of the end-to-end argument itself or its the classic case of engineering triumph over scientific axioms.

However, in recent years, proposals for next-generation architectures for replacing the Internet put immense emphasis on simplified, flexible core that will allow heterogeneous network architectures and services to coexist (in this case each coexisting architecture can be considered to be an endpoint to the shared substrate). This again brings us back full circle to the end-to-end principle justifying its worth.

Overall, this oft-cited but seldom-read article provides a great perspective on the design of the Internet as we see it today and on systems development in general. Great read for opening the scorecard!

In Retrospect

A retrospective view on this paper and what has changed in the Internet since then can be found in the following paper.

M. S. Blumenthal and D. D. Clark, “Rethinking the Design of the Internet: The End-to-End Arguments vs. the Brave New World,” ACM Transactions on Internet Technology, 1(1): 70-109, ACM Press, August, 2001. [PDF]

A single sentence, pulled verbatim from the paper, can sum up the changes:

Of all the changes that are transforming the Internet, the loss of trust may be the most fundamental.

Unfortunately, this is probably true for everything :(

Par Lab Boot Camp 2009: Short Course on Parallel Programming

Note: Videos of the talks from the workshop are now available here.

As part of their summer school activities, the Par Lab at UC Berkeley is arranging a short course/workshop on parallel programming from the 19th to 21st August’2009. This workshop will also act as the boot camp for the incoming graduate students who want to pursue their PhD studies with an aim to unleash the power of parallelism.

Fortunately, its free for the Berkeley people. I thought it’ll be cool to know some techniques or at least a few key words to understand what the parallel guys talk about when they don’t talk parallelly! So here I am, waiting for the classes to start, participating in the boot camp trying to be hip by bagging some multi-core lingo.

Here goes the official program summary from the boot camp website:

We will provide an introduction to a current parallel architectures and programming models. This will include a thorough exposure to languages and tools for shared-memory programming, including hands-on experience. We will provide an overview of programming on other important parallel architectures (GPUs, Clouds, and distributed memory machines). In addition we will provide a unique Berkeley View on architecting parallel software using design patterns. We will go into depth on the mechanisms for structuring complex parallel code as well as into the core computations of complex applications. Finally, we will illustrate all these ideas with in-depth discussions of a variety of exciting parallel applications from image retrieval, speech recognition, computer music, and other areas.

To remain committed to the promise of parallelism, yours truly will also be experimenting with his multi-core capabilities as a live blogger while being a serious (!) workshop attendee in this very page.

Note: Each day is on a different page; day i being on page (i+1).

Feature Article on Network Virtualization in IEEE ComMag

“Network Virtualization: State of the Art and Research Challenges” has been featured in the IEEE Communications Magazine’s July 2009 issue in its Network and Service Management Series. This one is a quick overview of a lot of things related to network virtualization: its past, present, and possible future. The main objective is to let the huge readership of the magazine (which includes all the members of IEEE ComSoc + others) know about the ongoing saga of network virtualization and next generation Internet.

Please have a look at the abstract below; a copy of the (sort of) final draft can be found in my publications page.

Recently network virtualization has been pushed forward by its proponents as a long-term solution to the gradual ossification problem faced by the existing Internet and proposed to be an integral part of the next-generation networking paradigm. By allowing multiple heterogeneous network architectures to cohabit on a shared physical substrate, network virtualization provides flexibility, promotes diversity, and promises security and increased manageability. However, many technical issues stand in the way of its successful realization. This article investigates the past and the state of the art in network virtualization along with the future challenges that must be addressed to realize a viable network virtualization environment.

Acceptance rate for this issue was 14.3% (four out of 28 submissions).

Presenting at INFOCOM’2009 in Brazil

Finally I am getting the opportunity to present my work in front of my peers, arguably in the biggest networking conference in size and impact. The slides are ready, the stage is set, and I am hoping I will be able to deliver. Since this is my first presentation in an international conference, I am really excited – may be a bit too much :D I am also looking forward to meeting lots of researchers from all around the world and networking with them for possible collaborations in the future. The fact that I am crossing the equator for the first time, is giving more fuel to my excitement.

Update (April 19): We (Andy and I) are in the confernce site. I am loving Rio :D Hopefully the presentation will go alright on Wednesday.

Update (April 22): Finished presenting my paper :)

Signed, sealed, delivered I’m done (with Master’s)

After 633 days, reading few hundred papers, writing a few, watching 100+ movies, reading 30+ fictions, eating lots of food, and doing many other things, my Master’s is over :D . I successfully presented/defended my thesis on 21st Jan 2009, and my thesis committee have approved it without any changes (there were 3 typos :( ). To the best of my knowledge this is the first thesis on network virtualization after the recent reincarnation. Officially the thesis is available in UWSpace. You can also get it here along with the presentation slides. I would like to thank my two readers, Prof. Ashraf Aboulnaga and Prof. Johnny Wong, and specially my supervisor Prof. Raouf Boutaba, who had been a pillar a support throughout the process.

The thesis focuses on identity management and resource allocation through virtual network embedding in the network virtualization environment. Since both the projects have already been accepted in IM and INFOCOM respectively, it was comparatively easier to put together the thesis. An overview of the thesis is given below:

Due to the existence of multiple stakeholders with conflicting goals and policies, alterations to the existing Internet architecture are now limited to simple incremental updates; deployment of any new, radically different technology is next to impossible. To fend off this ossification, network virtualization has been propounded as a diversifying attribute of the future inter-networking paradigm. In this talk, we provide an overview of the network virtualization environment (NVE) and address two basic problems in this emerging field of networking research.

The identity management problem is primarily concerned with ensuring interoperability across heterogeneous identifier spaces for locating and identifying end hosts in different virtual networks. We describe the architectural and the functional components of a novel identity management framework (iMark) that enables end-to-end connectivity across heterogeneous virtual networks in the NVE without revoking their autonomy.

The virtual network embedding problem deals with the mapping of virtual nodes and links onto physical network resources. We argue that the separation of the node mapping and the link mapping phases in the existing algorithms considerably reduces the solution space and degrades embedding quality. We propose coordinated node and link mapping to devise two algorithms (D-ViNE and R-ViNE) for the online version of the problem under realistic assumptions and compare their performance with the existing heuristics.

Btw, you should listen to this wonderful song by Stevie Wonder.

I’ve been awarded the Cheriton Scholarship

A very good news on the 2nd working day of the year… This is the time when you can say “off to a great start” :D I have been awarded the Cheriton Type I scholarship (10,000$ for two years) to continue my PhD in the David R. Cheriton School of Computer Science. I would like to take this opportunity to thank the award committee, the school, my supervisor Prof. Boutaba, and of course Prof. Cheriton (the scholarship is eponymous to him) for this honor. This scholarship will smooth out some rough edges in my lifestyle, and hopefully it will let me focus more on the real stuff.

This year the scholarship was awarded to eight ongoing and incoming students in total. I congratulate all the fellow winners in this achievement !!!

Following is an excerpt from the school website on this scholarship:

Dr. David Ross Cheriton is a distinguished Computer Science alumnus of the University of Waterloo Faculty of Mathematics. Dr. Cheriton received his MMath and PhD in Computer Science form Waterloo in 1974 and 1978, respectively. He is a Professor of Computer Science at Stanford University and heads the Stanford University Distributed Systems Group. He is widely known for his extraordinary research contributions in high-performance scalable distributed systems, Internet architecture and hardware-software interaction, and the successful commercialization of his research results..

The David R. Cheriton Graduate Scholarships are valued between $10,000 and $20,000 and will be awarded annually to forty to seventy full-time University of Waterloo graduate students in the David R. Cheriton School of Computer science on the basis of scholastic excellence and a demonstrated interest in research that addresses problems associated with designing and implementing efficient and reliable computing systems along with their effective integration.

These scholarships are open to Canadian and International students holding a student visa.