Kata Containers: Virtualization for Cloud-Native
This is the second post of my series blogs after the Shanghai Summit:
- The Two Years of Kata Containers
- Kata Containers: Virtualization for Cloud-Native
- The Blueprint of Kata 2.0
In the previous article, the author summarized the two-year developments of the Kata Containers project. What we have done could be concluded as:
- Make container runs in virtualized sandbox transparently;
- Make virtualization more lightweight and container-friendly (kata-agent, vsock, virtio-fs, etc.);
On the other hand, we have seen the users want Kata Containers to isolate the sandboxes better, reduce the overhead further, and become more container-friendly.
In this post, the author will try to analyze what we need to do in the future based on the current status of the ecosystem.
Sandboxing means more than security
When we created the project in 2017, we described it as,
“The speed of containers, the security of VMs”
And similar technologies have been classified as Secure Containers. However, we believe that isolation means more than security, and the users’ practice approved our points.
Sandboxing improves node efficiency. For example, with VMM as a sandbox, the processes and threads in a Pod are scheduled by the guest kernel inside the sandbox and the host kernel doesn’t need to schedule all the tasks. In high-density scenarios, the two-level scheduling relieves much workload of the host kernel and helps to keep the host reliable.
On the other hand, sandboxing may help the operation as well. Under the traditional Linux container context, all container processes are host processes, that is, the operation on the host should be careful. While in Kata Containers, the view of host processes will be much cleaner. More than processes, in Shanghai PTG, the attendees agreed that we should move the image to the sandbox as well and keep all container resources well isolated, which could make the operation of host much simpler. And moving image operations into sandbox may have other “side-effects”:
Better sandboxing may help the charging, accounting, and QoS ensuring. If we move all application-centric operations into the sandbox, then all the related CPU cycles, network streaming, and storage ops could be accounted for, managed, and throttled. Then the logic will become clearer and it will be harder to DoS the host.
Sandboxing may help to protect the end-user data privacy. Traditional containers store their rootfs on the host and every container thread is a host thread as well. Then it’s hard to prevent the host administrator to access the application data. This is not acceptable to a cloud provider because they should not touch any user data without authorization in advance.
It is obvious that the Cloud Native infrastructures need better isolation method, we should help them to isolate the applications better.
Could we keep the agility of container in virtualization
Users love the agility of containers, which could be scheduled and launched instantly, and consume scalable resources. On the contrary, virtual machines are thought to be fixed-size box and launch slowly. Then the question is, to what extent could we keep the agility of container in virtualization?
On the launch time, we have reduced the launch time to the sub-second level in the past years. By adopting technologies such as DAX, template, lighter VMMs, we could reduce the sandbox launching time significantly. Moreover, we have VM-cache technology, which keeps paused warm empty sandboxes in a cache and launching Pod from the cache could be even faster.
However, the launch time in practice is not only the runtime boot time, but an end-to-end time, which should take scheduling, image pulling, and container storage/network setting up into account. For example, in Ant Financial and Alibaba Cloud, we developed an image provision system based on emerging technologies including virtio-fs, OCI artifacts spec, by which we could reduce almost all image pulling time to about 0.2 seconds no matter how big the image is.
On the other hand, we believe the resource flexibility is on its way to being solved. The virtio-mem is under development and could be introduced in Kata Container once it is ready just like virtio-fs. With virtio-mem, we may securely do per-page memory add and remove and don’t need to care about DIMMs or ACPI.
What we want to say here is, virtualization is not born to be slow and inflexible, on the contrary, we could make it as agile as containerization without hurting the isolation.
Cut even more overhead of virtualization
In the past year, Firecracker VMM was introduced in Kata Containers, which reduced the memory overhead of VMM to 10MB level. And the rust-agent was merged in October, which reduced the overhead of agent from 10MB level to 1MB level. We confirmed the memory overhead still exists and I don’t think it could down to zero. However, the ongoing developments may cut more overhead.
In Shanghai PTG, developers discussed on this topic:
- If we change the RPC scheme of agent protocol from gRPC to ttRPC, we could save much memory. Actually this is being tested now — Ant Financial developers have already written a Rust implementation of ttRPC for tests.
- An Intel developer mentioned, as the VM itself is a sandbox, we do not need to set up a sandbox container inside the VM, i.e. we may remove sandbox operations from the agent protocol at all.
Other discussed related topic includes we may offload part of guest containers’ functionalities (such as infrastructure sidecars) to the VMM or user-space daemons in the host, by which we could avoid any performance penalty introduced by virtualization on the data plane.
The future of Kata: the virtualization for Cloud-Native
In short, we have done much on support containers with virtualizations and we could do even more. On the other hand, isolation by virtualization could help Cloud-Native infrastructures run better. Our mission is developing the virtualization technologies for Cloud Native, which is very different from those in virtual machines:
- Share resources across sandboxes but keep the boundary of sandboxes clear.
- Provision resources to sandboxes on-demand and promptly instead of hard fixed partition-like provision.
- The host user-space tool, VMM, and the guest kernel jointly serve the applications in sandboxes.
In the next article, we will talk about what we should do first in Kata 2.0 cycle.