DNS — I know what it is, but how does it work?

Tuấn Anh Phạm
Coccoc Engineering Blog
6 min readJan 7, 2022

Domain name service (DNS) is one of the oldest services on the internet, and also one of the most importance out there — it ‘s even being mentioned as “the very heart of the Internet network”. Everyone in the field know what dns is — a phone book for internet which translate human-friendly hostnames into IP addresses, but not all know how it works (or I assume so). In this blogpost, I want to write about how dns works, trying to make it as easy to understand as possible.

DNS terms

First, there ‘s some terms that I use in this post that needed to be understand first. As usual, most are shameless copies from somewhere on the internet:

  • RR (Resource record): the unit of information entry in DNS zone files, used to resolve all DNS queries. There ‘s more than 100 types of RRs, the common ones you may work with everyday are A, PTR and CNAME. For the complete list of RR types, refer to this wiki page.
  • zone: is a specific portion of the DNS namespace in the Domain Name System (DNS), which is managed by a specific organization or administrator.
  • Authoriative server: a dns server that gives answers for dns queries for a specific zone. A DNS server could be authoriative for multiple zones.
  • Recursive server: dns server that finds the answer for client, also called caching server.
  • Iterative server: dns server that instead of returning answer to clients, tells client to ask another dns server.
  • Resolver: general name for authoriative, recursive and iterative servers
  • Root servers: sit at the top of DNS hierarchy, they are scattered around the globe and operated by 12 independent organizations. There ‘re total 13 root servers today.
  • TLD (Top level domain): the highest level in hierarchical DNS, after root domain.

Domain name

A domain name, for example www.coccoc.com contains multiple parts, each part separated by dot “.”. We interpret that domain name as two parts: www as the server name and coccoc.com as its domain.
But unlike human, dns clients and servers read a domain name from right to left:

Second level domain is managed by organizations or individuals. Organizations can run their own DNS servers or delegate management to supplier.

A domain name could have up to 127 subdomains levels, but I rarely see one that exceed 4 levels, ’cause anything deeper than that is considered unfriendly with human — therefore against the original purpose of DNS.

How the DNS works from client side

An application or service usually doesn’t send DNS queries directly itself, it delegates the job to DNS client. Then, dns client will try to find an answer by the following order by default:

  • First, it will search for the answer from local host file (/etc/hosts for Linux OS).
  • Next, it will try to find the answer from local DNS cache.
  • If the answer ‘s not in cache, the DNS resolvers will be asked. We could register up to 3 nameserver as resolvers in /etc/resolv.conf.

There is one thing you must be aware of is that the order of nameservers on resolv.conf matters, the upper will be chosen first. That means you could register 2 or 3 nameservers to avoid problem of name resolution fail when the resolver die. However, if the upper name server in your resolv.conf dies, it can still cause you trouble. I will demonstrate it in a simple test:

# time dig coccoc.com +short
123.30.175.29
real 0m0.018s

The time needed for the query is short, in miliseconds. But when I tried to add a failed nameserver (one that doesn’t have dns service running), make it the top at /etc/resolv.conf:

# time dig coccoc.com +short
123.30.175.29
real 0m1.020s

It took more than 1 second for the query. Why 1 second — you ask: it ‘s because the default timeout for dns client is 1 sec. When the client send query to the 1st nameserver which is dead, it wait 1 sec until query to the second. This may cause a lot of performance overhead for services depending on dns, and may take time for you to discover where the problem is.

At coccoc, we avoid this situation by setup unbound, a lightweight dns service on every physical servers, acting like a local dns recursive. In the host file, the first nameserver will be 127.0.0.1, point to the local recursive. Unlike the client, at the dns service you could register more than 3 nameservers, and it could choose the nameserver with smallest response time instead of a fixed order. While it ‘s still fall back to 1 second timeout scenario if the unbound service die, however the service ‘s always easier to detect and handle.

How the resolver works

In case of a query reaches a resolver that is authoriative server, the server simply answer the client. If the authoriative server doesn’t know the answer, It will return NXDOMAIN response.

But if resolver is not authoriative, it will follow those steps:
1. First, resolver will try to find the answer from its cache if it is recursive.
2. If the answer ‘s not in cache or the server is iterative, resolver will ask the root server, which will returns the address of TLD server ’cause the root server ‘s also iterative.
3. The resolver will ask TLD server and will get the name of the requested domain ‘s SOA server as answer.
4. Finally, the resolver knows which server manage the domain, and send the dns query to that server.

Demonstration of steps 2 to 4

There one thing at step 3: if you manage the dns yourself, NS record will be in your zone configuration, which causes what we call a “circular reference”. That ‘s when we need a glue record.

The glue records

Glue records actually are A or AAAA records that point to addresses of authoriative servers for specific domain names. Glue records reside at TLD, so we could avoid circular reference.

For example, domain coccoc.com has nameservers ns1.coccoc.vn, ns2.coccoc.vn. To resolve the domain name, as the previous section also mentioned, the client would query in order: root, TLD vn., and ns1.coccoc.vn. However, ns1.coccoc.vn ‘s A record is defined at itself, so client will ask root server who is ns1.coccoc.vn, and creating a loop - a circular reference. To avoid this case, we A record of ns1.coccoc.vn is placed at TLD, so client will know who ns1.coccoc.vn is. Hence, the glue records.

To check if a domain has glue records or not is quite simple:
1. Get the list of TLD servers by using tool dig:

$ dig +short vn. NS
c.dns-servers.vn.
......

2. Pick one TLD server from the list and query directly to it. It should return ADDITIONAL section if the domain has glue records on TLD.

dig @c.dns-servers.vn coccoc.vn
...
;; ADDITIONAL SECTION:
ns1.coccoc.vn. 43200 IN A 123.30.175.7
ns2.coccoc.vn. 43200 IN A 123.30.175.113

And how does knowing these things help you?

You may wonder why you need to understand all the complicated things while you only need to know how to create and modify simple A or PTR records? Yes, a driver ‘s no need to be a car engineer to be good, but that driver could hire technicians to maintain and fix his car. You may not be that lucky, if you own the dns service yourself, and you must maintain it. Knowing how the dns works could help you debug things faster, you can find the problem and fix it in short time instead of blindly restart everything (that still may not works).

To do: in the future, I may write about some case study in debug and fix problems with dns.

--

--