A Trip Down the DNS Rabbit Hole: Understanding the Role of Kubernetes, Golang, libc, systemd
Kubernetes at Box
Kubernetes powers much of the compute infrastructure at Box. As more and more applications and services have moved to our Kubernetes platform, we’ve invested in tools to make debugging, validating service correctness, and responding to incidents easier for Box developers.
Kuberhealthy
One such tool is Comcast’s Kuberhealthy, a framework for synthetic monitoring in Kubernetes. Kuberhealthy introduces a controller and a CRD to define a Pod spec and a schedule to periodically create the Pod. The Pod can run an arbitrary container image and code, so long as it reports the results of the check back to the Kuberhealthy controller’s API.
Kuberhealthy has example check functionality published in the public Docker registry. Our first step in bringing Kuberhealthy into our Kubernetes platform was to get it working in minikube
, our tool of choice for local development. At Box, we use SmartStack for service discovery, so we had to include a SmartStack sidecar alongside the Kuberhealthy check container. This allows us to report the results of the check back to the Kuberhealthy controller.
When we started everything up in minikube
, we quickly ran into issues. While the controller started cleanly and the check Pod launched successfully, it was unable to report the results of the check back to the Kuberhealthy controller. Inspecting the error message, we quickly realized we had a DNS issue on our hands.
Get http://kube-health.localhost: dial tcp: lookup kube-health.localhost on 10.0.2.3:53: no such host
DNS resolution with libc, Golang, systemd
To report results from the check Pod back to the Kuberhealthy controller, we defined the Kuberhealthy controller hostname as kube-health.localhost
, which SmartStack would know how to resolve to the correct service.
The lookup process begins with a DNS query for kube-health.localhost
from the Kuberhealthy check Pod. How DNS queries are performed depends on a combination of application code, GNU libc configuration, and operating system configuration. For most applications that link against libc, how libc performs DNS resolution is configured in /etc/nsswitch.conf
. This applies for ping
, curl
, etc.
container$ cat /etc/nsswitch.conf
# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.
passwd: files
group: files
shadow: files
gshadow: files
hosts: files dns
networks: files
protocols: db files
services: db files
ethers: db files
rpc: db files
netgroup: nis
The hosts: files dns
line means that use of getaddrinfo
,gethostbyname
, and friends will first read from the/etc/hosts
before performing a DNS network query.
Looking at /etc/hosts
, we can see there are no entries that match kube-health.localhost
, so DNS lookups for that name should result in a DNS query on the network.
# Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
172.17.0.7 kh-7fbbd69ffc-8tr6d
This story is a bit more complicated on systemd
operating systems. The example Kuberhealthy deployment-check
container is built FROM scratch
, so is not using systemd
.
If we rebuild it on a CentOS based image,FROM centos:latest.
, we get the following /etc/nsswitch.conf
:
container$ cat /etc/nsswitch.conf
passwd: sss files systemd
shadow: files sss
group: sss files systemd
hosts: files dns myhostname
services: files sss
netgroup: sss
automount: files sss
aliases: files
ethers: files
gshadow: files
networks: files dns
protocols: files
publickey: files
rpc: files
We see that the hosts line has a new entry: hosts: files dns myhostname
.
myhostname
is an NSS plugin that extends how hostname lookup works. It injects the following rules into libc using programs:
•The local, configured hostname is resolved to all locally configured IP addresses ordered by their scope, or — if none are configured — the IPv4 address 127.0.0.2 (which is on the local loopback) and the IPv6 address ::1 (which is the local host).
•The hostnames "localhost" and "localhost.localdomain" (as well as any hostname ending in ".localhost" or ".localhost.localdomain") are resolved to the IP addresses 127.0.0.1 and ::1.
•The hostname "_gateway" is resolved to all current default routing gateway addresses, ordered by their metric. This assigns a stable hostname to the current gateway, useful for referencing it independently of the current network configuration state.
Note the very import: as well as any hostname ending in ".localhost" or ".localhost.localdomain") are resolved to the IP addresses 127.0.0.1 and ::1.
We will return to this.
Golang programs on the other hand do not use the libc DNS resolver automatically. Instead, DNS resolution is implemented natively in go. The determination of which resolver to use happens at runtime, for every resolve query.
From https://golang.org/pkg/net/#hdr-Name_Resolution:
By default the pure Go resolver is used, because a blocked DNS request consumes only a goroutine, while a blocked C call consumes an operating system thread. When cgo is available, the cgo-based resolver is used instead under a variety of conditions: on systems that do not let programs make direct DNS requests (OS X), when the LOCALDOMAIN environment variable is present (even if empty), when the RES_OPTIONS or HOSTALIASES environment variable is non-empty, when the ASR_CONFIG environment variable is non-empty (OpenBSD only), when /etc/resolv.conf or /etc/nsswitch.conf specify the use of features that the Go resolver does not implement, and when the name being looked up ends in .local or is an mDNS name.
Inspecting the environment on the Kuberhealthy check Pod container:
container$ env
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_SERVICE_PORT=443
HOSTNAME=kh-7fbbd69ffc-8tr6d
PWD=/app
HOME=/home/kuberhealthy
KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
GOLANG_VERSION=1.13.3
TERM=xterm
SHLVL=1
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
KUBERNETES_SERVICE_HOST=10.96.0.1
KUBERNETES_PORT=tcp://10.96.0.1:443
KUBERNETES_PORT_443_TCP_PORT=443
PATH=/go/bin:/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
GOPATH=/go
_=/usr/bin/env
So none of the environment variables are set, and nothing in /etc/nsswitch.conf
hosts
line looks special. So we expect that Kuberhealthy code will use the Golang DNS resolver which will use /etc/resolv.conf
to determine where and how to send the DNS query.
coredns and minikube DNS query routing
Looking at /etc/resolve.conf
, we can see how DNS queries that hit the network will be processed.
container$ cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
Kubernetes manages this file, having the container runtime configure it when the container launches. kubelet
's command line arguments--cluster-dns
and --cluster-domain
configure the nameserver and search lines respectively.
A lookup of kube-health.localhost
has fewer than 5 "dots", so the actual DNS queries made will be:
kube-health.localhost.default.svc.cluster.local
kube-health.localhost.svc.cluster.local
kube-health.localhost.cluster.local
All of these queries will be sent to nameserver 10.96.0.10
. So what is 10.96.0.10
?
$ kubectl get services -n=kube-system
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns 10.96.0.10 <none> 53/UDP,53/TCP 1h
That is a service cluster-ip
, so behind the scenes it’s a set of iptables
or ipvs
rules on the underlying Kubernetes Node. So the actual queries to 10.96.0.10
will be round-robined to the Pod ip addresses that are part of the kube-dns
. These ip addresses are owned by coredns
Pods.
$ kubectl get services kube-dns -n=kube-system -o json | jq .spec.selector
{
"k8s-app": "kube-dns"
}$ kubectl get pods --all-namespaces -l k8s-app=kube-dns -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system coredns-54ff9cd656-8s8v9 1/1 Running 1 1h 172.17.0.3 minikube
kube-system coredns-54ff9cd656-g78p5 1/1 Running 1 1h 172.17.0.2 minikube
So the queries will land on one of the coredns
Pods. When the query is received, coredns
will consult its configuration to decide how to perform the lookup. The coredns
configuration is refreshingly straightforward.
In Kubernetes, coredns
configuration is provided to the coredns
Pods via a ConfigMap.
$ kubectl get cm coredns -n=kube-system -o json | jq .data.Corefile -r
.:53 {
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
proxy . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
The main component here is the kubernetes
plugin block. This is a coredns
plugin to that knows how to watch the Kubernetes apiserver for Services and Endpoints and bind the service names to the Pod Endpoint ip addresses. It will answer any queries that end in cluster.local
(and some reverse queries, too).
Recall that our queries will look like this:
kube-health.localhost.default.svc.cluster.local
kube-health.localhost.svc.cluster.local
kube-health.localhost.cluster.local
So, they match this plugin. coredns
will not have any record for kube-health.localhost
so it will fall through out of this plugin (and the next plugin) to the proxy
plugin.
proxy . /etc/resolv.conf
states that all queries should be forwarded to the config specified in the /etc/resolv.conf
The contents of /etc/resolve.conf
inside any container is controlled by the Pod's dnsPolicy
. Normal Pods use the standard value of ClusterFirst
which causes the flow we been examining thus far (see the beginning of this note for a review of what /etc/resolve.conf
looks like in this case). But for coredns
, the dnsPolicy
is set to Default
.
$ kubectl get deploy coredns -n=kube-system -o json | jq .spec.template.spec.dnsPolicy
"Default"
This is called Default
because the default behavior of Docker is to use the /etc/resolve.conf
(along with /etc/hosts
) from the host when Docker launches containers. For coredns
containers the contents of /etc/resolve.conf
will match that of the host, minikube
. Inspecting that file we see a simple, single name server entry. This is how DNS queries for non-Kubernetes services from inside Pods will be routed.
minikube$ cat /etc/resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.
nameserver 10.0.2.3
So, what is 10.0.2.3
? Looking at configured routes on minikube
:
minikube$ ip route
default via 10.0.2.2 dev eth1 proto dhcp src 10.0.2.15 metric 1024
10.0.2.0/24 dev eth1 proto kernel scope link src 10.0.2.15
10.0.2.2 dev eth1 proto dhcp scope link src 10.0.2.15 metric 1024
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
Packets for 10.0.2.3
will be routed via eth1
to the default gateway of 10.0.2.2
. 10.0.2.2
is the virtual NAT gateway created by VirtualBox when configuring a NAT
-type NIC. Likewise, 10.0.2.3
is the nameserver that VirtualBox creates for NAT interfaces.
From the VirtualBox docs:
The NAT engine by default offers the same DNS servers to the guest that are configured on the host.
So our DNS query packet will be sent to the VirtualBox host’s nameserver. Sure enough, there's nothing registered for the name kube-health.localhost
or any of the variants, and we get an NXDOMAIN
reply:
$ nslookup kube-health.localhost.cluster.local
Server: 127.0.0.1
Address: 127.0.0.1#53
** server can't find kube-health.localhost.cluster.local: NXDOMAIN
Golang DNS resolution in depth
If you’ve read this far, this is consistent with the original problem: lookups of kube-health.localhost
don't work.
But the odd thing was that when running on a CentOS image ping
did work. As did manually compiling a Golang test program that connects to that name.
container$ cat dns_test.go
package main
import (
"io/ioutil"
"net/http"
"fmt"
)
func main() {
res, err := http.Get("http://kube-healthy .localhost")
if err != nil {
fmt.Println(err)
} else {
bodyBytes, _ := ioutil.ReadAll(res.Body)
fmt.Println(string(bodyBytes))
}
}
container$ go run dns_test.go
{
"OK": false,
"Errors": [
"Check execution error: kube-health/daemonset-check: timed out waiting for checker pod to report in",
"Check execution error: kube-health/deployment-check: timed out waiting for checker pod to report in"
],
"CheckDetails": {
"kube-health/daemonset-check": {
"OK": false,
"Errors": [
"Check execution error: kube-health/daemonset-check: timed out waiting for checker pod to report in"
],
"RunDuration": "",
"Namespace": "kube-health",
"AuthoritativePod": "kube-health-b8f9ff5fb-7d629",
"uuid": "32f736c7-3698-4671-8276-6cc8a8cb8b35"
},
"kube-health/deployment-check": {
"OK": false,
"Errors": [
"Check execution error: kube-health/deployment-check: timed out waiting for checker pod to report in"
],
"RunDuration": "",
"Namespace": "kube-health",
"AuthoritativePod": "kube-health-b8f9ff5fb-7d629",
"uuid": "d2ffdb29-de2c-40f9-8e00-62636336c332"
}
},
"CurrentMaster": ""
}
Recall that our CentOS image uses systemd
with an /etc/nsswitch.conf
that includes a myhostname
plugin for hosts
.
This causes a libc resolve of any hostname ending in ".localhost" or ".localhost.localdomain") are resolved to the IP addresses 127.0.0.1 and ::1.
This explains why ping
, curl
, and other libc linked programs work. After failing a /etc/hosts
lookup and a DNS query to coredns
results in NXDOMAIN
, it will finally fallback to the myhostname
plugin which will resolve to 127.0.0.1
.
So why does our test program work, but the Kuberhealthy checker does not?
Recall that Golang attempts to use the go resolver, and does not use the cgo, libc linked resolver by default. So we’d expect it to fail in both the test program and Kuberhealthy.
Given certain features in /etc/nsswitch.conf
, Golang will fallback to using the libc linked resolver. It turns out that myhostname
is one of these features that will cause resolution using cgo if the hostname being resolved is "localhost"-like.
So, on our CentOS image, with myhostname
NSS plugin present, we would now expect that the cgo resolver should always be used and that our queries for kube-health.localhost
would be handled by myhostname
NSS plugin and would succeed. Indeed, this is what we observe for the test app. But how do we explain the failure for the Kuberhealthy check Pod?
Well, it turns out that you can force go to always use the go resolver and never use the cgo resolver by disabling cgo during build time.
Kuberhealthy does this in their image build: https://github.com/Comcast/kuberhealthy/blob/master/cmd/deployment-check/Dockerfile#L8
ENV CGO_ENABLED=0
This forces use of the go resolver, and the DNS query follows the path in first part of this article. Removing that line will cause it to revert to dynamic resolver selection and fall back to myhostname
NSS plugin on CentOS.
We can confirm this behavior by using the very helpful debug statements in the Go DNS package by enabling them with GODEBUG=netdns=2
. With this environment variable present, the DNS package will log which resolver is used (and why) and how the DNS lookup will be performed.
Sure enough, with CGO_ENABLED=0
, we see the same error as the original problem Get http://kube-health.localhost: dial tcp: lookup kube-health.localhost on 10.0.2.3:53: no such host.
container$ CGO_ENABLED=0 GODEBUG=netdns=2 go run dns_test.go
go package net: built with netgo build tag; using Go's DNS resolver
go package net: hostLookupOrder(kube-health.localhost) = files,dns
Get http://kube-health.localhost: dial tcp: lookup kube-health.localhost on 10.0.2.3:53: no such hostcontainer$ GODEBUG=netdns=2 go run dns_test.go
go package net: dynamic selection of DNS resolver
go package net: hostLookupOrder(kube-health.localhost) = cgo
Get http://kube-health.localhost: dial tcp 127.0.0.1:80: connect: connection refused
container$ CGO_ENABLED=1 GODEBUG=netdns=2 go run dns_test.go
go package net: dynamic selection of DNS resolver
go package net: hostLookupOrder(kube-health.localhost) = cgo
Get http://kube-health.localhost: dial tcp 127.0.0.1:80: connect: connection refused
An easy fix
Now that we understand the problem, the fix is trivial. Since our Docker image is based on CentOS, we have a full libc available and simply remove the CGO_ENABLED=0
from the build. Problem solved, all systems go for Kuberhealthy.
If you’re interested in joining us, check out our open opportunities.