The sheer size of the database and frequency of updates suggest that it must be maintained in a distributed manner, with local caching to improve performance.The early DNS RFCs ( , , , and ) primarily discuss caching in the context of what calls "positive responses", that is, when the response includes the requested data. In this case, a TTL is associated with each Resource Record (RR) in the response. Resolvers can cache and reuse the data until the TTL expires. describes negative response caching, but notes it is optional and only talks about name errors (NXDOMAIN). This is the origin of using the SOA MINIMUM field as a negative caching TTL. updated to specify new requirements for DNS negative caching, including making it mandatory for caching resolvers to cache name error (NXDOMAIN) and no data (NODATA) responses when an SOA record is available to provide a TTL. further specified optional negative caching for two DNS resolution failure cases: server failure and dead/unreachable servers. This document updates to require negative caching of all DNS resolution failures and provides additional examples of resolution failures, to require caching for DNSSEC validation failures, as well as to expand the scope of prohibiting aggressive requerying for NS records at a failed zone's parent zone to all query types and to all ancestor zones. Motivation Operators of DNS services have known for some time that recursive resolvers become more aggressive when they experience resolution failures. A number of different anecdotes, experiments, and incidents support this claim. In December 2009, a secondary server for a number of in-addr.arpa subdomains saw its traffic suddenly double, and queries of type DNSKEY in particular increase by approximately two orders of magnitude, coinciding with a DNSSEC key rollover by the zone operator . This predated a signed root zone, and an operating system vendor was providing non-root trust anchors to the recursive resolver, which became out of date following the rollover. Unable to validate responses for the affected in-addr.arpa zones, recursive resolvers aggressively retried their queries. In 2016, the Internet infrastructure company Dyn experienced a large attack that impacted many high-profile customers. As documented in a technical presentation detailing the attack (see ), Dyn staff wrote:
At this point we are now experiencing botnet attack traffic and what is best classified as a "retry storm" Looking at certain large recursive platforms > 10x normal volumeIn 2018, the root zone Key Signing Key (KSK) was rolled over . Throughout the rollover period, the root servers experienced a significant increase in DNSKEY queries. Before the rollover, a.root-servers.net and j.root-servers.net together received about 15 million DNSKEY queries per day. At the end of the revocation period, they received 1.2 billion per day: an 80x increase. Removal of the revoked key from the zone caused DNSKEY queries to drop to post-rollover but pre-revoke levels, indicating there is still a population of recursive resolvers using the previous root trust anchor and aggressively retrying DNSKEY queries. In 2021, Verisign researchers used botnet query traffic to demonstrate that certain large public recursive DNS services exhibit very high query rates when all authoritative name servers for a zone return refused (REFUSED) or server failure (SERVFAIL) responses (see ). When the authoritative servers were configured normally, query rates for a single botnet domain averaged approximately 50 queries per second. However, with the servers configured to return SERVFAIL, the query rate increased to 60,000 per second. Furthermore, increases were also observed at the root and Top-Level Domain (TLD) levels, even though delegations at those levels were unchanged and continued operating normally. Later that same year, on October 4, Facebook experienced a widespread and well-publicized outage . During the 6-hour outage, none of Facebook's authoritative name servers were reachable and did not respond to queries. Recursive name servers attempting to resolve Facebook domains experienced timeouts. During this time, query traffic on the .COM/.NET infrastructure increased from 7,000 to 900,000 queries per second . Related Work describes negative caching for four types of DNS queries and responses: name errors, no data, server failures, and dead/unreachable servers. It places the strongest requirements on negative caching for name errors and no data responses, while server failures and dead servers are left as optional. is a Best Current Practice that documents observed resolution misbehaviors. It describes a number of situations that can lead to excessive queries from recursive resolvers, including requerying for delegation data, lame servers, responses blocked by firewalls, and records with zero TTL. makes a number of recommendations, varying from " SHOULD " to " MUST ". describes "The DNS thundering herd problem" as a situation arising when cached data expires at the same time for a large number of users. Although that document is not focused on negative caching, it does describe the benefits of combining multiple identical queries to upstream name servers. That is, when a recursive resolver receives multiple queries for the same name, class, and type that cannot be answered from cached data, it should combine or join them into a single upstream query rather than emit repeated identical upstream queries. , " ", includes a section that describes the phenomenon known as "Birthday Attacks". Here, again, the problem arises when a recursive resolver emits multiple identical upstream queries. Multiple outstanding queries make it easier for an attacker to guess and correctly match some of the DNS message parameters, such as the port number and ID field. This situation is further exacerbated in the case of timeout-based resolution failures. Of course, DNSSEC is a suitable defense to spoofing attacks. describes " ". This permits a recursive resolver to return possibly stale data when it is unable to refresh cached, expired data. It introduces the idea of a failure recheck timer and says:
Attempts to refresh from non-responsive or otherwise failing authoritative nameservers are recommended to be done no more frequently than every 30 seconds.Terminology The key words " MUST ", " MUST NOT ", " REQUIRED ", " SHALL ", " SHALL NOT ", " SHOULD ", " SHOULD NOT ", " RECOMMENDED ", " NOT RECOMMENDED ", " MAY ", and " OPTIONAL " in this document are to be interpreted as described in BCP 14 when, and only when, they appear in all capitals, as shown here.
Of course, by the robustness principle, domain software should not fail when presented with CNAME chains or loops; CNAME chains should be followed and CNAME loops signalled as an error.DNSSEC Validation Failures For zones that are signed with DNSSEC, a resolution failure can occur when a security-aware resolver believes it should be able to establish a chain of trust for an RRset but is unable to do so, possibly after trying multiple authoritative name servers. DNSSEC validation failures may be due to signature mismatch, missing DNSKEY RRs, problems with denial-of-existence records, clock skew, or other reasons. already discusses the requirements and reasons for caching validation failures. of this document strengthens those requirements. FORMERR Responses A name server returns a message with the RCODE field set to FORMERR when it is unable to interpret the query . FORMERR responses are often associated with problems processing Extension Mechanisms for DNS (EDNS(0)) . Authoritative servers may return FORMERR when they do not implement EDNS(0), or when EDNS(0) option fields are malformed, but not for unknown EDNS(0) options. Upon receipt of a FORMERR response, some recursive clients will retry their queries without EDNS(0), while others will not. Nonetheless, resolution failures from FORMERR responses are rare. Requirements for Caching DNS Resolution Failures Retries and Timeouts A resolver MUST NOT retry a given query to a server address over a given DNS transport more than twice (i.e., three queries in total) before considering the server address unresponsive over that DNS transport for that query. A resolver MAY retry a given query over a different DNS transport to the same server if it has reason to believe the DNS transport is available for that server and is compatible with the resolver's security policies. This document does not place any requirements on how long an implementation should wait before retrying a query (aka a timeout value), which may be implementation or configuration dependent. It is generally expected that typical timeout values range from 3 to 30 seconds. Caching Resolvers MUST implement a cache for resolution failures. The purpose of this cache is to eliminate repeated upstream queries that cannot be resolved. When an incoming query matches a cached resolution failure, the resolver MUST NOT send any corresponding outgoing queries until after the cache entries expire. Implementation details for such a cache are not specified in this document. The implementation might cache different resolution failure conditions differently. For example, DNSSEC validation failures might be cached according to the queried name, class, and type, whereas unresponsive servers might be cached only according to the server's IP address. Developers should document their implementation choices so that operators know what behaviors to expect when resolution failures are cached. Resolvers MUST cache resolution failures for at least 1 second. Resolvers MAY cache different types of resolution failures for different (i.e., longer) amounts of time. Consistent with , resolution failures MUST NOT be cached for longer than 5 minutes. The minimum cache duration SHOULD be configurable by the operator. A longer cache duration for resolution failures will reduce the processing burden from repeated queries but may also increase the time to recover from transitory issues. Resolvers SHOULD employ an exponential or linear backoff algorithm to increase the cache duration for persistent resolution failures. For example, the initial time for negatively caching a resolution failure might be set to 5 seconds and increased after each retry that results in another resolution failure, up to a configurable maximum, not to exceed the 5-minute upper limit. Notwithstanding the above, resolvers SHOULD implement measures to mitigate resource exhaustion attacks on the failed resolution cache. That is, the resolver should limit the amount of memory and/or processing time devoted to this cache. Requerying Delegation Information identifies circumstances in which:
...every name server in a zone's NS RRSet is unreachable (e.g., during a network outage), unavailable (e.g., the name server process is not running on the server host), or misconfigured (e.g., the name server is not authoritative for the given zone, also known as "lame").It prohibits unnecessary "aggressive requerying" to the parent of a non-responsive zone by sending NS queries. The problem of aggressive requerying to parent zones is not limited to queries of type NS. This document updates the requirement from to apply more generally:
Upon encountering a zone whose name servers are all non-responsive, a resolver MUST cache the resolution failure. Furthermore, the resolver MUST limit queries to the non-responsive zone's parent zone (and to other ancestor zones) just as it would limit subsequent queries to the non-responsive zone.DNSSEC Validation Failures states:
To prevent such unnecessary DNS traffic, security-aware resolvers MAY cache data with invalid signatures, with some restrictions.This document updates with the following, stronger, requirement:
To prevent such unnecessary DNS traffic, security-aware resolvers MUST cache DNSSEC validation failures, with some restrictions.One of the restrictions mentioned in is to use a small TTL when caching data that fails DNSSEC validation. This is, in part, because the provided TTL cannot be trusted. The advice from herein can be used as guidance on TTLs for caching DNSSEC validation failures. IANA Considerations This document has no IANA actions. Security Considerations As noted in , an attacker might attempt a resource exhaustion attack by sending queries for a large number of names and/or types that result in resolution failure. Resolvers SHOULD implement measures to protect themselves and bound the amount of memory devoted to caching resolution failures. A cache poisoning attack (see ) resulting in denial of service may be possible because failure messages cannot be signed. An attacker might generate queries and send forged failure messages, causing the resolver to cease sending queries to the authoritative name server (see for a similar "data corruption attack" and Section 5.2 of for a "DNSDoS attack"). However, this would require continued spoofing throughout the backoff period and repeated attacks due to the 5-minute cache limit. As in , this attack's effects would be "localized and of limited duration". Privacy Considerations This specification has no impact on user privacy.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4