[CVE-2020–15257] Don’t use --net=host . Don’t use spec.hostNetwork .

Akihiro Suda
nttlabs
Published in
5 min readNov 30, 2020

TL; DR: Running containers in the host network namespace is insecure. Don’t run Docker containers with docker run --net=host . Don’t run Kubernetes Pods with .spec.hostNetwork: true .

containerd CVE-2020–15257

CVE-2020–15257 disclosed on November 30, 2020 is an attack vector that allowed containerd containers running in the host network namespace with UID 0 to gain the host root privileges, via containerd’s abstract sockets exposed in the host network namespace.

An abstract socket is a kind of UNIX sockets but its path string starts with NUL character ( \0 ) and does not show up as a file in the user-visible file system. containerd was exposing its abstract sockets without authentication except checking UID. So, any process running in the host network namespace with UID 0 could connect to these sockets and call arbitrary API methods, including ones that executes arbitrary commands with the full root privileges.

Note that most users are not actually affected by this CVE. Docker users are affected if running docker run --net=host without specifying --user . Kubernetes users are affected if they are using containerd as the CRI runtime and running pods with .spec.hostNetwork: true without setting .spec.securityContext.runAsUser .

Abstract sockets used in containerd v1.4.2 / v1.3.8

The CVE was fixed in containerd v1.4.3/v1.3.9, by switching away from abstract sockets into plain old file-based UNIX sockets under /run/containerd .

File sockets used in containerd v1.4.3 / v1.3.9

How to see the list of containers running in the host network

Docker:

$ docker ps -a --filter 'network=host' 

Kubernetes:

$ kubectl get pods -A -o json |
jq -c '.items[] | select(.spec.hostNetwork==true) |[.metadata.namespace, .metadata.name]'

“I updated containerd, so now I can use the host network safely”

Not really. Because aside from containerd, lots of other daemons use abstract sockets as well. Such daemons include:

To see the actual abstract sockets present on your host, run grep -ao '@.*' /proc/net/unix :

$ grep -ao '@.*' /proc/net/unix 
@/org/kernel/linux/storage/multipathd
@/tmp/dbus-ihrEYFlKyT
@/containerd-shim/moby/d0f4f5dd326d505f79e20ca891ad35516656353bc7974378237826b3456bff86/shim.sock
@ISCSIADM_ABSTRACT_NAMESPACE
@/containerd-shim/moby/d0f4f5dd326d505f79e20ca891ad35516656353bc7974378237826b3456bff86/shim.sock

“So… lots of new CVEs??”

No, abstract sockets used by other daemons are unlikely to be considered worth assigning CVEs.

Actually, even containerd’s CVE-2020–15257 had been already known to (a small portion of) developers and users for years, and yet it had not been considered to be a vulnerability, because using the host network namespace is insecure regardless to existence of containerd sockets. While containerd project changed the vulnerability policy considering the impact and obscurity of the attack vector, other software projects may not consider using abstract sockets as vulnerabilities, and may continue using abstract sockets without assigning CVEs.

“I’m not using containerd, so I‘m unaffected”

You are potentially affected by other processes’ sockets. Regardless to your container runtime, your host probably has several abstract sockets as explained above, and these sockets are accessible from containers running in the host network namespace. It should be also noted that several runtimes including Singularity launch containers in the host network namespace by default.

However, if you are running containers with SELinux, probably you are unaffected. (See below)

“But I really need to use the host network!”

Think twice before using the host network namespace. If you want to use the host network namespace just because you don’t want to mess around with docker run -p for connecting containers from the host, consider connecting to containers by IP:

$ docker inspect -f '{{.NetworkSettings.IPAddress}}' nginx 
172.17.0.2
$ curl http://172.17.0.2
...
<title>Welcome to nginx!</title>
...

For Kubernetes, use kubectl get pods -o wide to see IPs. Also consider using kubectl port-forward or NoRouter.io to forward ports to the client. (NoRouter.io is our new project for connecting containers across hosts without a mess. I’ll post an article about NoRouter.io soon.)

Or if you don’t want to use docker run -p due to a performance issue, try disabling userland proxy (docker-proxy):

# cat <<EOF > /etc/docker/daemon.json 
{"userland-proxy": false}

EOF
# systemctl restart docker

Userland proxy has been enabled by default mostly due to compatibility and stability issues in the past, but it can be safely disabled on recent kernel (for IPv4, at least). So, probably it is going to be disabled by default in a future release of Docker.

Still if you really need to use the host network namespace, consider adopting the following solutions:

  1. Run containers as non-root users
  2. AppArmor
  3. SELinux

Solution 1: Run containers as non-root users

Whenever possible, run containers as non-root users. This simple trick can protect host abstract sockets in most cases.

For Docker, run docker run --net=host --user 12345 --security-opt no-new-privileges . Make sure to choose UID number that doesn’t conflict with the existing user accounts on the host. Specifying no-new-privileges is not necessary, but it is recommended for prohibiting privilege escalation like sudo .

For Kubernetes, specify .spec.[]containers.securityContext like this:

hostNetwork: true
containers:
- name: foo
securityContext:
runAsUser: 12345
allowPrivilegeEscalation: false

To use port numbers below 1024 with a non-root user, you might need to configure net.ipv4.ip_unprivileged_port_startas follows:

# echo 'net.ipv4.ip_unprivileged_port_start=0' > /etc/sysctl.d/99-user.conf# sysctl --system

Solution 2: AppArmor

AppArmor is a Linux Security Module used by several distros including Ubuntu, Debian, SUSE, and Google COS.

The following AppArmor profile can be used for disallowing containers to use abstract sockets:

AppArmor profile: “docker-no-abstract-socket”

This profile can be applied to Docker containers as follows:

$ sudo apparmor_parser -r docker-no-abstract-socket $ docker run --net=host --security-opt apparmor=docker-no-abstract-socket ...

To apply the profile to Kubernetes, see https://kubernetes.io/docs/tutorials/clusters/apparmor/ .

Solution 3: SELinux

Recent versions of RHEL/CentOS and Fedora are shipped with a SELinux policy that protects abstract sockets on the host:

$ getenforce 
Enforcing
$ socat abstract-listen:foo,fork stdio & $ sudo podman run -it --net=host alpine
/ # cat /proc/self/attr/current
system_u:system_r:container_t:s0:c83,c1019
/ # apk add -q socat
/ # echo test | socat stdio abstract-connect:foo
2020/11/27 15:42:08 socat[7] E connect(5, AF=1 "\0foo", 6): Permission denied

SELinux is enabled for Podman and OpenShift by default. To enable SELinux for Docker, configure/etc/docker/daemon.json as follows:

# cat <<EOF > /etc/docker/daemon.json 
{"selinux-enabled": true}

EOF
# systemctl restart docker

Recap

Running containers in the host network namespace is insecure, even if you are using the patched version of containerd, and even if you aren’t using containerd.

Don’t run Docker containers with docker run --net=host . Don’t run Kubernetes Pods with .spec.hostNetwork: true .

If you really need to use the host network, at least consider running containers with docker run --user or .spec.securityContext.runAsUser .

NTT is hiring!

NTT is looking for engineers who work in Open Source communities like containerd, Docker, Kubernetes, and their relevant projects. Visit https://www.rd.ntt/e/sic/recruit/ to see how to join us.

私たちNTTは、containerd、Docker、Kubernetesなどのオープンソースコミュニティで共に活動する仲間を募集しています。ぜひ弊社採用情報ページをご覧ください: https://www.rd.ntt/sic/recruit/

--

--