Why Kata Containers doesn’t replace Kubernetes:
A Kata Containers explainer
The Kata Containers project, launched in December 2017, builds lightweight virtual machines that seamlessly plug into the containers ecosystem. Kata Containers combines technology from Intel® Clear Containers and Hyper runV to provide the speed of containers with the security of virtual machines (VMs).
Kata Containers addresses the security drawbacks of traditional containers
Kata Containers bridges the gap between traditional VM security and the lightweight benefits of traditional Linux* containers. Traditional containers share the same underlying Linux kernel. There are known exploits that allow a malicious user to “escape” a container and gain access to the kernel and the shared containers. In a multi-tenant environment where workloads are running with unknown levels of trust, significant efforts are required to ensure a secure system.
Traditional containers use Linux control groups, referred to as cgroups, for managing and allocating resources and namespaces to provide container isolation. Further security isolation is provided by dropping Linux capabilities, using read-only mount points, mandatory access controls (MAC) security measures like those in SELinux and AppArmor*, dropping syscalls using SECCOMP, etc. It is difficult, if not impossible, to effectively apply these security policies to complex applications.
As a result, most often containers end up being provisioned in their own full VM, negating the performance promises of containers. Protecting against security breaches in these environments is one of the drivers behind the Kata Containers project.
Kata Containers provides container isolation by using hardware virtualization. In the case of Docker*, kata-runtime provides VM isolation at the container level. In the case of Kubernetes, VM isolation is provided at the pod level. Through the rest of this post, when we say container/pod, we mean container in the case of Docker and pod in the case of Kubernetes.
For Kata Containers, each container/pod is booted as a lightweight VM with its own unique kernel instance. Since each container/pod is now running with its own VM, they no longer gain access to the host kernel and get the full security benefits of a VM. This simplifies the security policies you need to put in place to protect the host kernel against container exploits.
Kata Containers also makes it possible for container-as-a-service (CaaS) providers to offer containers running on bare metal. Kata Containers allows mutually untrusting tenants to use the same cluster due to the hardware isolation between containers. This assumes there are also network security policies in place to provide network isolation between tenants within the cluster.
How Kata Containers fits into the container ecosystem
A container runtime is the component that handles the lifecycle of a container, implementing basic concepts such as creating, starting, stopping and removing a container workload. The Open Container Initiative (OCI) created a runtime specification that details the API for an OCI-compatible runtime.
runC is the canonical OCI runtime solution, which is described as a “CLI tool for spawning and running containers according to the OCI specification.” runC uses Linux cgroups and namespaces to provide isolation.
Kata Containers is a member of OCI and the Kata Containers runtime, kata-runtime, will be OCI-compatible.
Another place the term runtime is used is in the Container Runtime Interface (CRI) provided in Kubernetes. CRI runtimes are at a higher level of abstraction and should not be confused with an OCI-compatible runtime.
Interacting with Docker Engine
For Docker, kata-runtime is just another OCI-compatible runtime option that can be used.
In a default configuration, if you install and run Docker, the Docker engine will:
- Create a container configuration.
- Pass this configuration to runC.
- runC will create a container based on the configuration and workload provided from the Docker engine.
If you install Kata Container’s runtime, kata-runtime, you can configure Docker to be aware of both container runtimes, giving users the choice of which to use on a per-container granularity. kata-runtime complements runC and enhances the solution provided by Docker. See Docker’s runtime documentation for more details. When using kata-runtime, each Docker container will run within its own lightweight VM.
Kata Containers and Kubernetes
Kubernetes 1.5 introduced the CRI (Container Runtime Interface), which enables a variety of container runtimes to be plugged in easily. Prior to this, Kubernetes only made use of the default Docker image repository and its default OCI-compatible runtime, runC. Since “runtime” continues to be an overloaded term, in this discussion we’ll call the CRI runtime a CRI shim and use “runtime” to describe an OCI-compatible runtime.
Since the introduction of CRI, a number of CRI shims have been introduced, including cri-containerd, CRI-o, dockershim, and frakti. Some of these call into an OCI-based runtime, while others are a monolithic solution. A high-level overview of how these implement a solution via CRI is shown below. Of note, dockershim currently only supports runC, not kata-runtime.
Kata Containers provides two interfaces for CRI shims to manage hardware virtualization based Kubernetes pods:
- An OCI-compatible runtime, kata-runtime. This is currently usable with the CRI solutions, cri-containerd and CRI-O.
- A hardware virtualization runtime library API for CRI shims to consume and provide a more CRI-native implementation. Frakti is an example CRI shim here.
While the work of defining the concept of a secure sandbox continues at the Kubernetes level, some of the CRI implementations already support the concept of multiple runtimes running on a single node. For example, CRI-O supports the concept of a trusted and an untrusted sandbox. Based on pod annotations and default CRI-O configuration, you can run a mix of VM and namespace-based pods. This article goes into depth on how this is achieved today with CRI-O.
VM isolation is provided at the pod level for kata-runtime. Containers running inside a Kata Containers pod are isolated and managed via namespaces and cgroups, similar to what is done by runC.
You can try Kata Containers
Kata Containers 1.0 has not yet been released — contributors are busily working to complete the kata-runtime feature — but you can try a preview of Kata Containers by using the runV or Clear Containers runtimes. Check out this Developer Guide to get started.
Kata Containers is a fully open source project — check out Kata Containers on GitHub and join the channels below to find out how you can contribute.
IRC: #kata-dev on Freenode
Mailing list: http://lists.katacontainers.io/cgi-bin/mailman/listinfo