On the Security of Containers

or why it’s “containers always” and “VMs sometimes”

Since last year, as Linux containers have come to the forefront of the technology hype cycle, there has been much rumbling about the security of containers. Most of these discussions, articles, and blog posts that have been published inevitably compare containers to VMs as if this were a comparison that mattered. It is not.

Now, saying it doesn’t matter could be a bit strong. The story is closer to, “containers always”, “VMs sometimes”. There are certainly users that, upon seeing Docker decide that they wish to reevaluate their need for VMs. Principally, those users far prefer performance to security for their single-tenant workloads may wish for VM-like lifecycle management without the overhead imposed by virtualization.

However, for those that depend on VMs for security, Docker is not an alternative, but a complement.

I liken the difference between baremetal, VMs, and containers as I see the separation of buildings, apartments, and rooms. There are buildings such as warehouses that have neither apartments nor rooms, those that simply have a few rooms inside, and there buildings with apartments with rooms inside of them. There are also studio apartments which lack rooms. All rooms and apartments exist inside of buildings, but the number of rooms and apartments per building are highly variable.

To take the analogy further, buildings are usually made of strong materials such as concrete, brick, or steel. The separation of apartments might use thicker, sound-dampening materials. Finally, the separation of rooms may be anything from concrete to glass.

If you haven’t yet figured out the analogy, our rooms are Linux containers. Linux containers may be used inside of VMs, but yes, they may be used on bare-metal hardware. It’s possible to share an apartment with friends with each friend taking a room, but it’s not the same as each friend having their own apartment. It just isn’t. If you want to share an apartment with someone, it’s safer to share an apartment with people you really trust, or those that you’re going to live with anyway. Sometimes, but not always, that’s okay.

My wife and I have been married for 12 years. We have two children. You might guess that we live together with our children. We share a house, my wife and I share a room and our children share another. I also have a third room for my home office.

It wouldn’t make much sense to put the children into their own apartment, just as it would not make sense to put an OpenSSH daemon into its own VM apart from your application.

No, instead, we keep our children in our house, but they’re given a room. We’ve decided that while doing so might prevent them from trying to sleep in Mommy and Daddy’s bed, it would simply be impractical (and would probably trigger a CPS call if we moved our 3yr/o into her own apartment).

My home-office is a “no entry” zone. I have effectually firewalled that room. My children still do come in, on occassion, but only infrequently. Even the two-year old knows it is Dad’s space and to respect it.

Containers, like rooms, are invaluable in keeping close — but separate— those processes that for a number of reasons must be collocated inside the same machine or VM. Those processes are already collocated in such a fashion and thus you are not benefiting from the security benefits of VMs in the context of the relationship between those processes.

Containers are like firewalls between processes

The loose separation between containers is not a bug, but an asset. If we presume that the processes we run inside of containers would already be living within the same VM , imposing tighter, more granular access controls against those processes is only a good thing.

In fact, even on a host that would run a singular process, running containers is additive to your security story. Lets presume we won’t run any of the common processes that do typically get run on a host, such as syslogd, sshd, etc. No, for this exercise, let us presume we have a VM that directly loads a web server as its init process (PID 1).

What are the capabilities of that process? What can it do? May it send raw network frames? Can it elevate to root? What is there is a setuid-root binary on the system, can it execute it and elevate? If it elevates, what is it allowed to do? Would it gain DAC override or the ability to mount filesystems, or any of the other privileges typically granted to root users?

Yes, SeLinux can be used to secure your processes. Yet, I’d like to see the number of hands that get raised if you ask a room full of “DevOps Engineers” if they’ve ever disabled SeLinux. Then, ask how many have ever written their own SeLinux policies, or even *read* SeLinux policies. What precentage of Linux systems today use SeLinux in a way that protect as well as a default Docker policy? What is the barrier of entry?

Also, yes, your application can explicitly drop these privileges, but you’re relying on your application and its developers to do the right thing. Alternatively, you could use a wrapper… which, I suppose, is what Docker is.

Another way of saying it is: just because you use a VM, doesn’t mean you should run ‘chmod -R 777 /’ on your hosts. Your OS offers security tunables for your processes and you’re probably not using them. Docker is the tool that makes these usable and work, largely, out-of-the-box.

Critical to this story has been the success and failure of DevOps, both in tooling and in culture. I have begun to realize that for many organizations, “DevOps” is really “soft ops”. Developers are more empowered than ever to build and deploy their application, usually into a VM. This has provided a separation of concerns where the hard-ops folks will manage the hardware and networking infrastructure underneath. For those running on AWS or GCE, you’ve outsourced the hard-ops work to Amazon and Google.

The hard-ops folks, your traditional systems administrators, and your security engineers are those that used to be involved in deploying your applications and helping assess matters of security. In some shops, they may still, but I suspect the number of shops where developers use self-provisioning-in-a-bubble is only increasing.

Unfortunately, what has not existed as a strong movement is DevSecOps (or SecDevOps). As DevOps practices lean heavily into the purview of “No Ops”, a focus on security may be lost.

Docker changes this by being the DevOps tool that imposes a security model on its users and makes tweaking that model easy and convenient. It is the tool that puts security in the hands of the developer without letting them shoot themselves in the foot, especially if the alternative is use of the far more permissive defaults of Linux processes.

Using Docker, a stricter-than-average set of capabilities are set by default. Accessing an even stricter set of capabilities is possible with a few command-line arguments. It isn’t a matter of reading through manpages of Linux syscalls and hoping you’ve done things right, but simply setting flags at runtime (and possibly, in the future, setting these in a Dockerfile).

I agree that the security of a container isn’t any better than a well-secured application using sys_setcap(), a custom suite of SeLinux labels, and a roll-your-own use of Linux namespaces. However, that’s precisely what Linux containers are. Containers are not contradictory to other, existing best-practices. They’re not contradictory to VMs, but work well with them. It’s not contradictory to SeLinux or AppArmor, but works with them. In fact, when you come down to it, once you start tweaking and configuring all of the security tunables in Linux to secure your application as much as possible, you’ll realize that you’ve simply rolled your own container solution.

Interested in reading more? Check out “User namespaces might not be secure enough”. Also checkout more of Erica’s articles.

Erica Windisch is the CTO and founder of IOpipe, Inc. where she is working on tooling and services for developing and operating serverless applications.