New features in containerd 1.4

Akihiro Suda
nttlabs
Published in
4 min readAug 17, 2020

containerd 1.4 was released on August 17, 2020, with a lot of novel features including support for “lazy-pulling”, SELinux MCS, cgroup v2, and Windows CRI.

Lazy-pulling

Lazy-pulling means starting a container without waiting for completion of pulling the image. This wasn’t possible before, because the OCI standard tar.gz doesn’t support seeking a random offset in the gzipped archive.

containerd 1.4 supports lazy-pulling by using Stargz: a seekable tar.gz format proposed by Google. A Stargz archive contains the offset table structure (stargz.index.json) at the tail of the archive, which allows accessing file entries without scanning the whole archive. The data structure of Stargz is somewhat similar to ZIP format, however, unlike ZIP, Stargz is fully compatible with the legacy tar.gz format.

legacy tar.gz vs Stargz

The containerd implementation of Stargz also supports our extended version of Stargz called eStargz. eStargz reorders and squashes relevent files in an archive so that they can be prefetched in a single HTTP request.

Stargz vs eStargz

The benchmark result on an Azure VM instance (eastus2) with Docker Hub shows that Stargz/eStargz can shorten the container startup latency from several tens of seconds into a few seconds in the best cases.

Benchmark result . Lower is better.

To see how to configure containerd to enable lazy-pulling with Stargz/eStargz, visit https://github.com/containerd/stargz-snapshotter .

See also my colleague Kohei Tokunaga’s article “Startup Containers in Lightning Speed with Lazy Image Distribution on Containerd”.

SELinux MCS on CRI

The CRI plugin now supports automatically setting up SELinux MCS (Multi-Category Security) to strictily isolate containers. The MCS support itself has been there for a long time and supported by Docker, but it hadn’t been supported in the previous versions of the CRI plugin.

MCS prohibits interference between processes and files with different SELinux categories, which appear in /proc/$PID/attr/current and ls -Z :

[root@container]# whoami
root
[root@container]# cat /proc/self/attr/current
system_u:system_r:container_t:s0:c496,c687
  • Linux user: root
  • SELinux user: system_u
  • SELinux role: system_r
  • SELinux domain (aka type): container_t
  • SELinux sensitivity (aka level): s0
  • SELinux categories: c496,c687

A process running with categories c496,c687 can access files with at least one of c496 and c687 :

[root@container]# ls -Z foo
system_u:object_r:container_file_t:s0:c496,c687 foo
[root@container]# cat foo
hello

However, it cannot access files with different categories:

[root@container]# ls -Z foo
system_u:object_r:container_file_t:s0:c136,c954 foo
[root@container]# cat foo
cat: hello: Permission denied

cgroup v2

Previously, containerd didn’t work on the recent versions of Fedora without adding systemd.unified_cgroup_hierarchy=0 to the kernel cmdline, because containerd didn’t support cgroup v2, which is the default cgroup version on Fedora (since Fedora 31).

Now containerd supports cgroup v2, and works on Fedora without tweaking the kernel cmdline.

The most significant advantage of cgroup v2 compared to v1 is that it can be safely configured without the root privileges. This allows Rootless Docker users to set up container resource limitations:

$ export DOCKER_HOST=unix:///run/user/1000/docker.sock$ docker info
...
Cgroup Driver: systemd
Cgroup Version: 2
...
Security Options:
...
rootless
...
$ docker run -it --cpus 0.5 --memory 512m --pids-limit 100 alpine
/ # cat /sys/fs/cgroup/cpu.max
50000 100000
/ # cat /sys/fs/cgroup/memory.max
536870912
/ # cat /sys/fs/cgroup/pids.max
100

Note that Docker with the cgroup v2 support (Docker 20.XX) hasn’t been officially released yet as of August 2020, but binary snapshots are available on my GitHub repo: https://github.com/AkihiroSuda/moby-snapshot

Docker 20.XX is likely to be officially released in the couple of months.

Other changes in containerd 1.4

The CRI plugin now supports Windows containers. See also the Kubernetes Enhancement Proposal (KEP) for the current implementation status.

containerd 1.4 also adds the support for the following features:

containerd is getting more adoptions in the market

containerd is getting more adoptions in the Cloud Native market, day by day.

In addition to Google Kubernetes Engine (GKE) and IBM Cloud Kubernetes Service (IKS), Azure Kubernetes Service (AKS) also began supporting containerd nodes since July 2020. containerd has been also used by Fargate integration of Amazon EKS since December 2019.

Aside from Kubernetes services, containerd is now even included in VMware Fusion to run Linux containers on macOS (Project Nautilus). See VMware’s blog for the further information.

Plan for containerd 1.5

The containerd community is already discussing the plan for the next version.

  • NRI (Node Resource Interface): The new common interface for controlling node resources such as cgroups. Heavily inspired by CNI.
  • Sandbox API: CRI sandbox containers will be supported as first-class objects. The /pause processes will no longer be needed.
  • Snapshot quota: Setting quota for filesystem snapshots will be finally supported.

containerd 1.5 is likely to be released around Q4 2020 or Q1 2021.

Don’t miss the containerd sessions at KubeCon EU

KubeCon EU 2020 (Aug 17–20) will have the following sessions presented by containerd maintainers and contributors:

Tue, Aug 18, 13:00–13:35 (CEST)

Startup Containers in Lightning Speed with Lazy Image Distribution
Kohei Tokunaga, NTT

PDT: 04:00 / UTC: 11:00 / JST: 20:00

Tue, Aug 18, 17:45–18:20 (CEST)

Introduction to containerd
Phil Estes, IBM & Derek McGowan, Docker

PDT: 08:45 / UTC: 15:45 / JST: Wed 00:45

Wed, Aug 19, 13:00–13:35 (CEST)

containerd Deep Dive
Akihiro Suda (me), NTT & Wei Fu, Alibaba Cloud

PDT: 04:00 / UTC: 11:00 / JST: 20:00

NTT is hiring!

For containerd 1.4, NTT made significant contributions including the support for lazy-pulling and cgroup v2.

NTT is looking for engineers who work in Open Source communities like containerd, Kubernetes, and their relevant projects. Visit our recruitment page to see how to join us: [English][Japanese (日本語)].

--

--