How we made DNS both fast and private with ECS
TL;DR: In this article we describe why public DNS resolvers need EDNS0 Client Subnet (ECS) to workaround performance issue of popular services that use DNS to steer their clients to a nearby server. Then we will see that ECS can have issues with privacy and DNS cache efficiency.
We will then describe our approach to benefit from ECS without having to suffer from its downsides by building a subnet substitution map and a whitelist.
At NextDNS, we strive to make DNS both fast and private. A naive way to define a fast DNS is by measuring the time it takes for a DNS server to give back a response, taking into account the network latency and the server processing time.
Rarely a DNS response is the information requested by the end user. Most likely, DNS is a dependency of another higher level service like HTTP. Often, those HTTP services are using DNS to steer their traffic to the closest location to their users. This technique is commonly called GeoDNS or latency based routing (depending on the source of data used).
DNS steering works on the premise that DNS resolvers are provided by ISPs, and thus live on the same network within close location of the user. As all users of an ISP are potentially sharing the same set of DNS resolver caches, DNS steering provides a relatively efficient way to steer traffic to the closest location without exposing too much user privacy at the DNS level.
With the growth in popularity of public DNS resolvers like Google DNS and OpenDNS in the last decade, DNS steering stopped working efficiently for the users of those resolvers. Even with faster DNS response time, those services would reply with IPs that are not close to users, making HTTP requests slower. As HTTP latency account for much more of the overall loading time than DNS, this is a bad tradeoff.
In order to work around this issue, EDNS0 Client Subnet (ECS) was invented. This extension to the DNS protocol provides a way for the DNS resolver to release information about the client IP requesting the domain to the authoritative DNS. The full client IP is not transmitted, only a part of it called a subnet.
A subnet represents one or more IPs sharing the same prefix. The smaller the prefix, the more IPs are contained in the subnet. For instance, the CIDR representation 198.51.100.0/24 represents the 256 IPs starting with 198.51.100.x while 198.51.100.0/22 represents the 1024 IPs between 198.51.100.0 and 198.51.103.255.
An authoritative DNS (or auth DNS) is a DNS server used to host domain zones for which they have authority. Their clients are usually DNS cache resolvers which perform recursive DNS queries for end clients running stub resolvers.
In response to a DNS query with ECS information, an authoritative DNS can choose to give a response that is valid for a subnet that is equal or larger than the subnet prefix provided. This way, the DNS resolver can cache the response for potentially more clients, improving their DNS response time. The size of the cacheable response prefix is called the scope. A scope of 0, means the queried name does not have ECS enabled and thus can be cached globally.
In practice, 99% of ECS enhanced requests are performed with a /24
prefix in IPv4 and an /56
prefix in IPv6. For IPv4, a /24
is only 256 IPs. This is very narrow and can be concerning for privacy.
Authoritative DNS are supposed to reply with larger scope if possible to improve cacheability, but we measured that 45% of authoritative DNS supporting ECS and giving a non-zero scope reply with a /24
scope or smaller for IPv4. With such small subnet, the cache becomes highly fragmented, and the average cache hit ratio of a DNS resolver will go from something like 85% to only 25%, and close to 0% for /24
scoped domains, substantially slowing down average DNS resolution time.
For those reasons, few DNS resolvers deployed ECS. For the most part, only public resolvers like Google DNS and OpenDNS deployed it.
The trend those days for newer, privacy oriented public DNS resolver seems to be ditching ECS altogether. This is the choice Cloudflare DNS and Quad9 have made for instance. Their reasoning is that with the many locations they have, their cache is sufficiently localized to offset the absence of ECS.
This statement can be true if the number of Points of Presence (PoPs) of the DNS provider is higher or equal to the number of PoPs exposed by the content provider with similar locality. In the case of Cloudflare, this is probably true in most cases.
Although, some large content providers like Netflix, Facebook or Google as well as CDNs like Akamai have servers hosted directly inside ISP networks. Their DNS will only steer clients to those ISP embedded servers if the IP of the resolver or the subnet provided through ECS is part of the ISP’s IP space. ECS is thus required for public DNS resolvers to benefit from those servers.
The main downside of ECS is privacy and DNS cache fragmentation due to the narrow client subnet used for both query and response. We can solve those two issues by substituting client’s subnet with another subnet that would be shared by all clients from the same rough location and ISP.
To achieve this, we use an aggregated BGP full table extract to associate all IP blocks with their associated ASN.
An Autonomous System Number (ASN) in a nutshell is an ID attributed to a network entity (like an ISP) to abstract their network assets (like IP blocks) when communicating their routing policy.
We then use a GeoIP database to split blocks by country. For large countries like USA, we further split by Metro code, which is still pretty large, but local enough to give good results. The output is a list of IP blocks is mapped to a key composed of the ASN:Country[:Metro Code]:
194.55.44.0 - 194.55.47.255: 12874:IT
194.55.48.0 - 194.55.63.255: 20570:DE
194.55.64.0 - 194.55.79.255: 3320:DE
194.55.80.0 - 194.55.83.255: 48095:US:618
194.55.84.0 - 194.85.47.255: 12874:IT
For each keys, we randomly pick one /24
subnet from all the IP blocks sharing the same key, and create the final map that defines how to substitute an IP block with the randomly picked /24
from the same ASN:Country[:Metro Code]:
12874:IT: 2.224.0.0/24
20570:DE: 91.207.93.0/24
3320:DE: 2.160.0.0/24
48095:US:618: 94.176.172.0/24
This map is then used by our resolver at runtime to map client IPs to those /24:
194.55.44.0 - 194.55.47.255: 2.224.0.0/24
194.55.48.0 - 194.55.63.255: 91.207.93.0/24
194.55.64.0 - 194.55.79.255: 2.160.0.0/24
194.55.80.0 - 194.55.83.255: 94.176.172.0/24
194.55.84.0 - 194.85.47.255: 2.224.0.0/24
Thanks to this technique, client subnet no longer leak and all clients of a given ISP in the same rough location benefit from the same cache space. We noticed an improvement of the cache hit ratio of about 10%. This is lower than expected, but still a good improvement for privacy and tail latency.
While we were analyzing the results, we found that more than 50% of the top 1 million domains (qnames to ne exact) supporting ECS are giving the same responses for subnets in very different locations. It seems like those authoritative DNS check the ECS feature checkbox, but don’t actually need it, forcing DNS resolvers supporting ECS to store duplicate copies the same result for each client subnet. We also notice that those domains are likely the one responding with a narrow ECS scope.
To further improve cache hit ratio and avoid sending any kind of ECS information for domains that don’t use it to improve user experience, we decided to generate a whitelist. This whitelist is generated by selecting from the top 10 queried domains the ones that, when queried with ECS subnet from 3 different locations far from each others are:
- showing ECS support
- giving a non-zero ECS scope
- returning a different set of IPs for each subnet
- getting a max delta RTT of 100ms or more between IPs to prove they are hosted in different parts of the globe
With those two solutions together, our cache hit ratio got back up to about 75%.
A little known feature of NextDNS is the ability to get debug info for any query. It is a bit hacky, hence the dig parser warning. The idea is to make a CHAOS query instead of a IN one. NextDNS responds to such queries (only over TCP) with the IN class answer and some TXT debug fields in the CH class:
$ dig +tcp +nocomment chaos wikipedia.org @ecs-test.nextdns.io
;; Warning: Message parser reports malformed message packet.; <<>> DiG 9.11.3-1ubuntu1.8-Ubuntu <<>> +tcp +nocomment chaos wikipedia.org @dns.nextdns.io
;; global options: +cmd
;wikipedia.org. CH A
wikipedia.org. 600 IN A 198.35.26.96
proto.nextdns.io. 0 CH TXT "TCP"
client.nextdns.io. 0 CH TXT "67.164.100.157"
conf.nextdns.io. 0 CH TXT "default"
smart-ecs.nextdns.io. 0 CH TXT "23.24.192.0/24"
;; Query time: 123 msec
;; SERVER: 2a07:a8c0::65:849d#53(45.90.28.0)
;; WHEN: Tue Aug 20 21:55:42 PDT 2019
;; MSG SIZE rcvd: 229
You can see the smart-ecs TXT record shows the substituted subnet used for the client IP, which is from a totally different subnet than the client IP but from same ASN/Metro. This does also mean facebook.com is in the ECS whitelist.
Now if tried with another domain:
$ dig +tcp +nocomment chaos apple.com @ecs-test.nextdns.io
;; Warning: Message parser reports malformed message packet.; <<>> DiG 9.11.3-1ubuntu1.8-Ubuntu <<>> +tcp +nocomment chaos apple.com @dns.nextdns.io
;; global options: +cmd
;apple.com. CH A
apple.com. 2545 IN A 17.172.224.47
apple.com. 2545 IN A 17.178.96.59
apple.com. 2545 IN A 17.142.160.59
proto.nextdns.io. 0 CH TXT "TCP"
client.nextdns.io. 0 CH TXT "67.164.100.175"
conf.nextdns.io. 0 CH TXT "default"
smart-ecs.nextdns.io. 0 CH TXT "not sent"
;; Query time: 73 msec
;; SERVER: 2a07:a8c0::65:849d#53(45.90.28.0)
;; WHEN: Tue Aug 20 21:56:36 PDT 2019
;; MSG SIZE rcvd: 265
We can see that ECS isn’t sent at all. It means this domain is not whitelisted because it does not make use of the ECS information to improve user experience.