nttlabs
Published in

nttlabs

ubuntu:21.10 and fedora:35 do not work on the latest Docker (20.10.9)

If you try to run ubuntu:21.10 on the latest Docker (20.10.9), you will face wreak havoc:

And you can’t run fedora:35 , either:

Old versions of containerd/CRI (before v1.5.6/v1.4.10), Podman, and CRI-O were affected by the same issue, too.

Why?

This is because the default seccomp profile of Docker 20.10.9 is not adjusted to support the clone() syscall wrapper of glibc 2.34 adopted in Ubuntu 21.10 and Fedora 35.

The new clone() syscall wrapper introduced in glibc 2.34 tries to call the clone3() syscall before calling the real clone() syscall. If clone3() returns ENOSYS (“Function not implemented”) error, glibc falls back to the legacy behavior that calls the real clone() syscall. However, if clone3() returns other errors, glibc fails immediately without calling the real clone() syscall.

The problem is that Docker 20.10.9 is not aware of clone3() , and Docker injects the SCMP_ACT_ERRNO(EPERM) rule for all syscalls that are unknown to Docker. So, when glibc attempts to call clone3() , the kernel raises EPERM (“Operation not permitted”) error according to Docker’s seccomp profile. Thus glibc fails.

The internal of clone() in glibc (LGPL-2.1)

The right solution

The right solution is to upgrade Docker to 20.10.10 or later.

Docker 20.10.10 is NOT released yet as of the time of writing (October 18, 2021), but probably it will be released in a just couple of days.

The fix has been already committed to the master branch and the20.10 branch of the upstream github.com/moby/mobyrepo, so you can opt to bother to compile it by yourself if you can’t wait for the 20.10.10 release.

Also, some distribution vendors have already cherry-picked the fix to their packages ahead of the 20.10.10 release.
e.g., Ubuntu package of docker.io/20.10.7 has been already patched to fix the issue.

Update (Oct 26, 2021)
Docker 20.10.10 is now available ( https://get.docker.com/ )

Workaround 1: `--security-opt seccomp=unconfined`

A workaround without updating Docker is to disable seccomp:

However, this workaround have several drawbacks:

  • Insecure
  • Does not work when you are not allowed to modify the --security-opt flags
  • Does not work for docker build

Workaround 2: `SHELL [“/clone3-workaround”, …]`

I wrote https://github.com/AkihiroSuda/clone3-workaround for providing a workaround that is free from the drawbacks of the Workaround 1.

This program loads an additional seccomp profile that hides the existence of clone3()syscall from glibc by injecting anSCMP_ACT_ERRNO(ENOSYS)rule, so that the clone()wrapper of glibc works in the legacy-compatible mode.

The usage is easy. Just download (or compile) the binary, and mount it into the container, and run /clone3-workaround COMMAND [ARGUMENTS...] .

To use with docker build , set SHELL ["/clone3-workaround","/bin/sh","-c"] in your Dockerfile, and just run docker build .

How can we prevent this from happening again?

If we could change the default rule of the seccomp profile from SCMP_ACT_ERRNO(EPERM) to SCMP_ACT_ERRNO(ENOSYS) , we could avoid these kinds of issues.

Several folks including Aleksa Sarai of SUSE have been proposing this change to the Docker/Moby community, but it may take some time to land: https://github.com/moby/moby/issues/42871

NTT is hiring!

We NTT are looking for engineers who work in Open Source communities like Docker/Moby, containerd, Kubernetes, and their relevant projects. Visit https://www.rd.ntt/e/sic/recruit/ to see how to join us.

私たちNTTは、Docker/Moby、 containerd、Kubernetes などのオープンソースコミュニティで共に活動する仲間を募集しています。ぜひ弊社採用情報ページをご覧ください: https://www.rd.ntt/sic/recruit/

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store