J. Jung, E. Sit, H. Balakrishnan, “DNS Performance and the Effectiveness of Caching,” IEEE/ACM Transactions on Networking, V. 10, N. 5, (October 2002). [PDF]
Hierarchical design and aggressive caching are considered to be the main two reasons behind the scalability of DNS. Both factors seek to reduce the load on the root servers at the top of the name space hierarchy; in addition, caching tries to limit client-perceived delays and wide-area network bandwidth usage. This paper analyzes three network traces collected from two different sites during different time periods to study the effectiveness of these factors.
This study is based on three separate traces (two from MIT and one from KAIST). The authors collected both the DNS traffic and its driving workload, i.e., outgoing DNS queries and corresponding incoming answers (within 60s window) as well as outgoing TCP connections’ SYN, FIN, and RST packets. DNS lookups related to outgoing non-TCP traffic as well as lookups resulting from incoming TCP traffic are excluded, which amount for 5% and 10% of overall lookups, respectively and considered to be negligible.
By analyzing the collected data and performing trace-based simulation, the authors had following observations:
- Distribution of popular names follow the Zipf’s law, meaning a small fraction of the names are queried large fractions of the time. As a result, popular names can be cached efficiently with smaller TTL values, whereas unpopular names fail to take advantage of caching with larger TTL values.
- Sharing cache among groups of clients has limited gain in terms of cache hit after the total member count crosses 20-25. This is also due to the Zipfian distribution of name popularity and TTL.
- The authors found that the client-perceived latency is adversely affected by number of referrals, and caching NS records to reduce the number of referrals will decrease latency as well as load on the root servers.
- Distribution of names causing negative responses follows a heavy tailed distribution as well. As a result, hit rate of negative caching is also limited.
- Many DNS servers are too persistent in retrying and overload the network. 63% of the traced DNS query packets were generated by lookups that obtained no answers.
The authors have done a great job in collecting and analyzing a large amount of data to provide some insights counter-intuitive to common beliefs. However, there are some issues regarding their collected data. It is not sure why they completely ignored UDP traffic. Several observations are made based on how things have changed in 12 months from the two MIT traces, which is effectively drawing conclusions from just two data points. Another thing is that they saw something and sort of explained that to put forth some conclusions counter-intuitive to previous observations by other fellows; however, it is not sure whether they would have seen something common, had they used other people’s data. In short, the collected data might not have been varied enough to draw conclusions about the whole Internet.