Securing your CaaS using Google’s gVisor
For the uninitiated
Linux container technology has existed for over 10 years in the form of LXC but Docker has been responsible for popularising it with its easy interface to building and managing all aspects of a containerised application life-cycle. Similar technologies have existed in other Operating Systems in the form of FreeBSD jails, AIX Workload Partitions and Solaris Containers.
In 2015 CoreOS introduced Rkt — A new App Container Image to rival Docker’s container specification, it was quite likely that the container movement would start to fragment and to curb this from happening Open Container Initiative (OCI) formerly Open Container Project was established. OCI is now run by the Linux Foundation and aims to standardise container formats & runtimes. In this article, I’ll be talking about docker and securing AWS ECS Cluster but the same should be applicable to any container system that relies on containerd.
One of the biggest challenges that modern software development faces is “it works on my machine ¯\_(ツ)_/¯ ”, i.e. reliable parity between dev, test & production environments. Modern DevOps tools and practices solve this problem to a great extent but not all organisation have the same DevOps maturity and being able to generate immutable infrastructure for a polyglot runtime environment is time-consuming and expensive to maintain.
Docker containers solve this problem by bundling the entire runtime environment i.e. The application + language runtime + dependencies + other binaries + configuration files, into a single artefact called a Docker Image. The only thing that is not bundled into a container is the kernel. The host kernel is shared among all the containers.
It is quite common to see multiple containers running on the same host. A node managed by Kubernetes container orchestrator can run up to 100 pods per host. This is made possible using Linux cgroups to segment and allocated memory and CPU resources to each container.
What containers are not
Containers provide isolated runtime environments for your application. Isolation doesn’t necessarily mean secure or sandboxed. If necessary security controls are not implemented a container could cause a privilege escalation attack and compromise the host machine and all the other application running on it.
One such vulnerability discovered in the recent past was the Dirty Cow Vulnerability, you can read more about it here. In short, Dirty Cow is a local privilege escalation attack that allows a guest container to break into the host machine and modify files on disk.
These vulnerabilities are generally exploitable because the guest container has access to the host kernel syscalls. Seccomp and Apparmour are generally used to mitigate these threats but require a certain skill set & expertise in managing and configuring these systems.
Are you scared yet?
You should be, If you are:
1) A Hosting Provider — In this case, you don’t necessarily trust the code that is running on your infrastructure(think Heroku). Your customers may have malicious intent and could be running exploits on your service to take control of the infrastructure.
2) Cluster Admin — You may know your customers in this case but if you belong to a large enterprise with a mature CI/CD pipeline, it is quite likely that code gets pushed to production several times a day. In this case, you may not be able to check if vulnerable code is being pushed to your containerised environment. There are several tools in the market that are capable of code checking but none will guarantee a 100% success rate.
3) Required to run a vulnerable application — This seems like it would never happen but I’ve witnessed organisation running media libraries and OCR software that had known buffer overflow vulnerabilities which were never patched but were continued to use due to lack of an alternative.
4) Affected by 0 Day Vulnerability — Vendors generally take time to release security patches for newly discovered security flaws. During this window, you are pretty much a sitting duck.
gVisor to the Rescue
gVisor aims to provide a sandboxed environment for the containers.
gVisor is a user-space kernel, written in Go, that implements a substantial portion of the Linux system surface. It includes an Open Container Initiative (OCI) runtime called
runscthat provides an isolation boundary between the application and the host kernel. The
runscruntime integrates with Docker and Kubernetes, making it simple to run sandboxed containers.
Since gVisor is OCI compliant it is a drop-in replacement for
runc The runtime is further divided into 2 processes, Sentry & Gopher. Sentry is responsible for intercepting syscalls made by the user code and executing it in the user space kernel. Gopher handles file system calls that extend beyond the sandboxed environment.
Alright this is awesome — Whats the catch?
gVisor is still in a very nascent stage and is being actively developed, it currently implements approximately 200 system calls. Not all the Linux Kernel syscalls are implemented in gVisor which means not all applications that run on Linux will necessarily run when gVisor runtime is being used. But you are in luck if you are trying to secure one of these use cases:
This for me is already an impressive list and most people could potentially get started but I would move with caution before taking this to production.
The added security and strong isolation boundaries come with a small impact on performance. I’ve noticed my containers generally take about 100–200ms longer to launch. This by no means was done with any scientific method and could completely be subjective.
Can I get all this goodness on AWS ECS (EC2 Launch type)?
Yes, you can and we at Momenton have you covered. You will need to build an ECS optimised AMI that configures Docker to use
Head on over to Momenton’s GitHub fork it and bake a new AMI and launch your ECS cluster EC2 instances with the new AMI.