A Trip Down the DNS Rabbit Hole: Understanding the Role of Kubernetes, Golang, libc, systemd

Published in

Box Tech Blog

10 min readOct 13, 2020

Illustrated by Jeremy Nguyen/ Art directed by Sarah Kislak

Kubernetes at Box

Kubernetes powers much of the compute infrastructure at Box. As more and more applications and services have moved to our Kubernetes platform, we’ve invested in tools to make debugging, validating service correctness, and responding to incidents easier for Box developers.

Kuberhealthy

One such tool is Comcast’s Kuberhealthy, a framework for synthetic monitoring in Kubernetes. Kuberhealthy introduces a controller and a CRD to define a Pod spec and a schedule to periodically create the Pod. The Pod can run an arbitrary container image and code, so long as it reports the results of the check back to the Kuberhealthy controller’s API.

Kuberhealthy has example check functionality published in the public Docker registry. Our first step in bringing Kuberhealthy into our Kubernetes platform was to get it working in minikube, our tool of choice for local development. At Box, we use SmartStack for service discovery, so we had to include a SmartStack sidecar alongside the Kuberhealthy check container. This allows us to report the results of the check back to the Kuberhealthy controller.

When we started everything up in minikube, we quickly ran into issues. While the controller started cleanly and the check Pod launched successfully, it was unable to report the results of the check back to the Kuberhealthy controller. Inspecting the error message, we quickly realized we had a DNS issue on our hands.

Get http://kube-health.localhost: dial tcp: lookup kube-health.localhost on 10.0.2.3:53: no such host

DNS resolution with libc, Golang, systemd

To report results from the check Pod back to the Kuberhealthy controller, we defined the Kuberhealthy controller hostname as kube-health.localhost, which SmartStack would know how to resolve to the correct service.

The lookup process begins with a DNS query for kube-health.localhost from the Kuberhealthy check Pod. How DNS queries are performed depends on a combination of application code, GNU libc configuration, and operating system configuration. For most applications that link against libc, how libc performs DNS resolution is configured in /etc/nsswitch.conf. This applies for ping, curl, etc.

container$ cat /etc/nsswitch.conf
# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.

passwd:         files
group:          files
shadow:         files
gshadow:        files

hosts:          files dns
networks:       files

protocols:      db files
services:       db files
ethers:         db files
rpc:            db files

netgroup:       nis

The hosts: files dns line means that use of getaddrinfo,gethostbyname, and friends will first read from the/etc/hosts before performing a DNS network query.

Looking at /etc/hosts, we can see there are no entries that match kube-health.localhost, so DNS lookups for that name should result in a DNS query on the network.

# Kubernetes-managed hosts file.
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
fe00::0	ip6-mcastprefix
fe00::1	ip6-allnodes
fe00::2	ip6-allrouters
172.17.0.7	kh-7fbbd69ffc-8tr6d

This story is a bit more complicated on systemd operating systems. The example Kuberhealthy deployment-check container is built FROM scratch, so is not using systemd.

If we rebuild it on a CentOS based image,FROM centos:latest., we get the following /etc/nsswitch.conf:

container$ cat /etc/nsswitch.conf

passwd:     sss files systemd
shadow:     files sss
group:      sss files systemd
hosts:      files dns myhostname
services:   files sss
netgroup:   sss
automount:  files sss
aliases:    files
ethers:     files
gshadow:    files
networks:   files dns
protocols:  files
publickey:  files
rpc:        files

We see that the hosts line has a new entry: hosts: files dns myhostname.

myhostname is an NSS plugin that extends how hostname lookup works. It injects the following rules into libc using programs:

•The local, configured hostname is resolved to all locally configured IP addresses ordered by their scope, or — if none are configured — the IPv4 address 127.0.0.2 (which is on the local loopback) and the IPv6 address ::1 (which is the local host).
•The hostnames "localhost" and "localhost.localdomain" (as well as any hostname ending in ".localhost" or ".localhost.localdomain") are resolved to the IP addresses 127.0.0.1 and ::1.
•The hostname "_gateway" is resolved to all current default routing gateway addresses, ordered by their metric. This assigns a stable hostname to the current gateway, useful for referencing it independently of the current network configuration state.

Note the very import: as well as any hostname ending in ".localhost" or ".localhost.localdomain") are resolved to the IP addresses 127.0.0.1 and ::1. We will return to this.

Golang programs on the other hand do not use the libc DNS resolver automatically. Instead, DNS resolution is implemented natively in go. The determination of which resolver to use happens at runtime, for every resolve query.

From https://golang.org/pkg/net/#hdr-Name_Resolution:

By default the pure Go resolver is used, because a blocked DNS request consumes only a goroutine, while a blocked C call consumes an operating system thread. When cgo is available, the cgo-based resolver is used instead under a variety of conditions: on systems that do not let programs make direct DNS requests (OS X), when the LOCALDOMAIN environment variable is present (even if empty), when the RES_OPTIONS or HOSTALIASES environment variable is non-empty, when the ASR_CONFIG environment variable is non-empty (OpenBSD only), when /etc/resolv.conf or /etc/nsswitch.conf specify the use of features that the Go resolver does not implement, and when the name being looked up ends in .local or is an mDNS name.

Inspecting the environment on the Kuberhealthy check Pod container:

container$ env
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_SERVICE_PORT=443
HOSTNAME=kh-7fbbd69ffc-8tr6d
PWD=/app
HOME=/home/kuberhealthy
KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
GOLANG_VERSION=1.13.3
TERM=xterm
SHLVL=1
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
KUBERNETES_SERVICE_HOST=10.96.0.1
KUBERNETES_PORT=tcp://10.96.0.1:443
KUBERNETES_PORT_443_TCP_PORT=443
PATH=/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
GOPATH=/go
_=/usr/bin/env

So none of the environment variables are set, and nothing in /etc/nsswitch.conf hosts line looks special. So we expect that Kuberhealthy code will use the Golang DNS resolver which will use /etc/resolv.conf to determine where and how to send the DNS query.

coredns and minikube DNS query routing

Looking at /etc/resolve.conf, we can see how DNS queries that hit the network will be processed.

container$ cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

Kubernetes manages this file, having the container runtime configure it when the container launches. kubelet's command line arguments--cluster-dns and --cluster-domain configure the nameserver and search lines respectively.

A lookup of kube-health.localhost has fewer than 5 "dots", so the actual DNS queries made will be:

kube-health.localhost.default.svc.cluster.local
kube-health.localhost.svc.cluster.local
kube-health.localhost.cluster.local

All of these queries will be sent to nameserver 10.96.0.10. So what is 10.96.0.10?

$ kubectl get services -n=kube-system
NAME       CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
kube-dns   10.96.0.10   <none>        53/UDP,53/TCP   1h

That is a service cluster-ip, so behind the scenes it’s a set of iptables or ipvs rules on the underlying Kubernetes Node. So the actual queries to 10.96.0.10 will be round-robined to the Pod ip addresses that are part of the kube-dns. These ip addresses are owned by coredns Pods.

$ kubectl get services kube-dns -n=kube-system -o json | jq .spec.selector
{
  "k8s-app": "kube-dns"
}$ kubectl get pods --all-namespaces -l k8s-app=kube-dns -o wide
NAMESPACE     NAME                       READY     STATUS    RESTARTS   AGE       IP           NODE
kube-system   coredns-54ff9cd656-8s8v9   1/1       Running   1          1h        172.17.0.3   minikube
kube-system   coredns-54ff9cd656-g78p5   1/1       Running   1          1h        172.17.0.2   minikube

So the queries will land on one of the coredns Pods. When the query is received, coredns will consult its configuration to decide how to perform the lookup. The coredns configuration is refreshingly straightforward.

In Kubernetes, coredns configuration is provided to the coredns Pods via a ConfigMap.

$ kubectl get cm coredns -n=kube-system -o json | jq .data.Corefile -r
.:53 {
    errors
    health
    kubernetes cluster.local in-addr.arpa ip6.arpa {
       pods insecure
       upstream
       fallthrough in-addr.arpa ip6.arpa
    }
    prometheus :9153
    proxy . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
}

The main component here is the kubernetes plugin block. This is a corednsplugin to that knows how to watch the Kubernetes apiserver for Services and Endpoints and bind the service names to the Pod Endpoint ip addresses. It will answer any queries that end in cluster.local (and some reverse queries, too).

Recall that our queries will look like this:

kube-health.localhost.default.svc.cluster.local
kube-health.localhost.svc.cluster.local
kube-health.localhost.cluster.local

So, they match this plugin. coredns will not have any record for kube-health.localhost so it will fall through out of this plugin (and the next plugin) to the proxy plugin.

proxy . /etc/resolv.conf states that all queries should be forwarded to the config specified in the /etc/resolv.conf

The contents of /etc/resolve.conf inside any container is controlled by the Pod's dnsPolicy. Normal Pods use the standard value of ClusterFirst which causes the flow we been examining thus far (see the beginning of this note for a review of what /etc/resolve.conf looks like in this case). But for coredns, the dnsPolicy is set to Default.

$ kubectl get deploy coredns -n=kube-system -o json | jq .spec.template.spec.dnsPolicy
"Default"

This is called Default because the default behavior of Docker is to use the /etc/resolve.conf (along with /etc/hosts) from the host when Docker launches containers. For coredns containers the contents of /etc/resolve.conf will match that of the host, minikube. Inspecting that file we see a simple, single name server entry. This is how DNS queries for non-Kubernetes services from inside Pods will be routed.

minikube$ cat /etc/resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 10.0.2.3

So, what is 10.0.2.3? Looking at configured routes on minikube:

minikube$ ip route
default via 10.0.2.2 dev eth1 proto dhcp src 10.0.2.15 metric 1024
10.0.2.0/24 dev eth1 proto kernel scope link src 10.0.2.15
10.0.2.2 dev eth1 proto dhcp scope link src 10.0.2.15 metric 1024
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1

Packets for 10.0.2.3 will be routed via eth1 to the default gateway of 10.0.2.2. 10.0.2.2 is the virtual NAT gateway created by VirtualBox when configuring a NAT-type NIC. Likewise, 10.0.2.3 is the nameserver that VirtualBox creates for NAT interfaces.

From the VirtualBox docs:

The NAT engine by default offers the same DNS servers to the guest that are configured on the host.

So our DNS query packet will be sent to the VirtualBox host’s nameserver. Sure enough, there's nothing registered for the name kube-health.localhost or any of the variants, and we get an NXDOMAIN reply:

$ nslookup kube-health.localhost.cluster.local
Server:		127.0.0.1
Address:	127.0.0.1#53

** server can't find kube-health.localhost.cluster.local: NXDOMAIN

Golang DNS resolution in depth

If you’ve read this far, this is consistent with the original problem: lookups of kube-health.localhost don't work.

But the odd thing was that when running on a CentOS image ping did work. As did manually compiling a Golang test program that connects to that name.

container$ cat dns_test.go
package main
import (
    "io/ioutil"
    "net/http"
    "fmt"
)
func main() {
    res, err := http.Get("http://kube-healthy .localhost")
    if err != nil {
        fmt.Println(err)
    } else {
        bodyBytes, _ := ioutil.ReadAll(res.Body)
        fmt.Println(string(bodyBytes))
    }
}

container$ go run dns_test.go
{
  "OK": false,
  "Errors": [
    "Check execution error: kube-health/daemonset-check: timed out waiting for checker pod to report in",
    "Check execution error: kube-health/deployment-check: timed out waiting for checker pod to report in"
  ],
  "CheckDetails": {
    "kube-health/daemonset-check": {
      "OK": false,
      "Errors": [
        "Check execution error: kube-health/daemonset-check: timed out waiting for checker pod to report in"
      ],
      "RunDuration": "",
      "Namespace": "kube-health",
      "AuthoritativePod": "kube-health-b8f9ff5fb-7d629",
      "uuid": "32f736c7-3698-4671-8276-6cc8a8cb8b35"
    },
    "kube-health/deployment-check": {
      "OK": false,
      "Errors": [
        "Check execution error: kube-health/deployment-check: timed out waiting for checker pod to report in"
      ],
      "RunDuration": "",
      "Namespace": "kube-health",
      "AuthoritativePod": "kube-health-b8f9ff5fb-7d629",
      "uuid": "d2ffdb29-de2c-40f9-8e00-62636336c332"
    }
  },
  "CurrentMaster": ""
}

Recall that our CentOS image uses systemd with an /etc/nsswitch.conf that includes a myhostname plugin for hosts.

This causes a libc resolve of any hostname ending in ".localhost" or ".localhost.localdomain") are resolved to the IP addresses 127.0.0.1 and ::1.

This explains why ping, curl, and other libc linked programs work. After failing a /etc/hosts lookup and a DNS query to coredns results in NXDOMAIN, it will finally fallback to the myhostname plugin which will resolve to 127.0.0.1.

So why does our test program work, but the Kuberhealthy checker does not?

Recall that Golang attempts to use the go resolver, and does not use the cgo, libc linked resolver by default. So we’d expect it to fail in both the test program and Kuberhealthy.

Given certain features in /etc/nsswitch.conf, Golang will fallback to using the libc linked resolver. It turns out that myhostname is one of these features that will cause resolution using cgo if the hostname being resolved is "localhost"-like.

So, on our CentOS image, with myhostname NSS plugin present, we would now expect that the cgo resolver should always be used and that our queries for kube-health.localhost would be handled by myhostname NSS plugin and would succeed. Indeed, this is what we observe for the test app. But how do we explain the failure for the Kuberhealthy check Pod?

Well, it turns out that you can force go to always use the go resolver and never use the cgo resolver by disabling cgo during build time.

Kuberhealthy does this in their image build: https://github.com/Comcast/kuberhealthy/blob/master/cmd/deployment-check/Dockerfile#L8

ENV CGO_ENABLED=0

This forces use of the go resolver, and the DNS query follows the path in first part of this article. Removing that line will cause it to revert to dynamic resolver selection and fall back to myhostname NSS plugin on CentOS.

We can confirm this behavior by using the very helpful debug statements in the Go DNS package by enabling them with GODEBUG=netdns=2. With this environment variable present, the DNS package will log which resolver is used (and why) and how the DNS lookup will be performed.

Sure enough, with CGO_ENABLED=0, we see the same error as the original problem Get http://kube-health.localhost: dial tcp: lookup kube-health.localhost on 10.0.2.3:53: no such host.

container$ CGO_ENABLED=0 GODEBUG=netdns=2 go run dns_test.go
go package net: built with netgo build tag; using Go's DNS resolver
go package net: hostLookupOrder(kube-health.localhost) = files,dns
Get http://kube-health.localhost: dial tcp: lookup kube-health.localhost on 10.0.2.3:53: no such hostcontainer$ GODEBUG=netdns=2 go run dns_test.go
go package net: dynamic selection of DNS resolver
go package net: hostLookupOrder(kube-health.localhost) = cgo
Get http://kube-health.localhost: dial tcp 127.0.0.1:80: connect: connection refused

container$ CGO_ENABLED=1 GODEBUG=netdns=2 go run dns_test.go
go package net: dynamic selection of DNS resolver
go package net: hostLookupOrder(kube-health.localhost) = cgo
Get http://kube-health.localhost: dial tcp 127.0.0.1:80: connect: connection refused

An easy fix

Now that we understand the problem, the fix is trivial. Since our Docker image is based on CentOS, we have a full libc available and simply remove the CGO_ENABLED=0 from the build. Problem solved, all systems go for Kuberhealthy.

If you’re interested in joining us, check out our open opportunities.