Best Practices for Securing Containers

Alex Baretto
8 min readJun 6, 2017

Recently Gartner has advised in favor of using container-based app delivery models, claiming that container technology is more secure than having apps running on a bare operating systems. Gartner Analyst Jeorg Fritsch has saId that “Gartner asserts that applications deployed in containers are more secure than applications deployed on the bare OS”. This is because of the fact that even if containers are somehow compromised, “they greatly limit the damage of a successful compromise, because applications and users are isolated on a per-container basis so that they cannot compromise other containers, or the host OS.”

Although containers offer a higher degree of isolation and security compared to installing multiple apps on bare metal, as the various containers with the different apps share a common kernel, they are still burdened with innate security properties that make them vulnerable to kernel privilege escalation attacks. This means that they’re not exactly the right tool for high-risk-assurance isolation. This is of course not the case for full virtual machines which provide a higher degree of isolation and security, yet offer a heavier burden in terms of performance, footprint and portability.

Organizations must understand that although containers seem to offer some degree of isolation, they are not in themselves a comprehensive security solution. In this post we will provide some basic tips and best practices on how to secure your containers. Although these may be relevant to most container technologies, we will focus specifically on Docker, the leading container vendor in the market today.

Kernel Namespaces Basics

To understand how to ensure Docker security, we must first examine the fundamental characteristics of the container isolation, specifically concerning namespaces and cgroups. Basically, containers and the underlying host share a kernel. Each container is assigned its own, mostly independent, runtime environment. Each container receives its own network stack and process space, as well as its instance of a file system. When you start a container, behind the scenes Docker creates a set of namespaces and control groups for the container. These Namespaces provide the initial form of isolation. When configured properly, namespace isolation means that processes running within a container cannot see, and even less affect, processes running in another container or in the host system.

Until recently Docker did not offer this type of isolation. This meant that a process running in the container would have the privileges on the underlying host as well. For example, a process running as “root” in a container would have root-level privileges on the underlying host when interacting with the kernel.

Control Groups Basics

Control Groups implement resource accounting and utilization limits. Besides providing important metrics and statistics regarding resource utilization, they also help ensure that each container gets the amount of resources, such as memory, CPU and disk I/O to perform its tasks. Another aspect of control groups is avoiding DoS (Denial of Service) attacks by limiting the amount of resources that a single container can use, so that a single container cannot bring the system down by exhausting those resources.

Tip #1 — A different network stack for each application

By default, Docker enables use of the host’s network stack by the container. However, it is possible to run a container with network set to “host” (–network=”host”). In this case the container will share the host’s network stack and all interfaces from the host will be available to the container. In addition, the container’s hostname will match the hostname on the host system. For this reason, you should avoid using the –network=”host”. This means that different containers can potentially share access to the same host network stack, and thereby to the same sockets or interfaces. In order to maintain isolation between containers, it is important to ensure that each container gets its own network stack and that applications within these containers communicate with each other via API only. Otherwise, when you specify public ports for your containers or use links, IP traffic is allowed between containers. This means that just like physical machines connected through a common Ethernet switch, they can ping each other, send/receive UDP packets and establish TCP connections.

To ensure this type of isolation, change the icc configuration parameter to –icc=false. This will allow the iptables to protect other containers and the main host from having arbitrary ports probed or accessed by a container that gets compromised. If you need containers to provide services to each other, you should use the –link=CONTAINER_NAME_or_ID:ALIAS option. The value CONTAINER_NAME in –link= must either be an auto-assigned Docker name likestupefied_pare or the name you assigned with –name= when you ran ‘docker run’. It cannot be a hostname, which Docker will not recognize in the context of the –link= option.

If the Docker daemon is running with both –icc=false and –iptables=true then, when it sees docker run invoked with the –link= option, the Docker server will insert a pair of iptables ACCEPT rules. This allows the new container to connect to the ports exposed by the other container — the ports that it mentioned in the EXPOSE lines of its Dockerfile.

However, when the network is set to “host”, the container will share the host’s network stack and all interfaces from the host will be available to the container. In addition, the container’s hostname will match the hostname on the host system. For this reason, you should avoid using the –network=”host”, as it gives the container full access to local system services such as D-bus and is therefore considered insecure.

Tip #2 — avoid using mount to host

In addition to creating a volume using the -v flag you can mount a directory from your Docker daemon’s host into a container. This means that a directory on the host will be available to the Container. Mounting a host directory can be useful for testing. For example, you can mount source code inside a container. Then, change the source code and see its effect on the application in real time.

However, using mount to host is risky in terms of security, as it can potentially provide access to the host’s file system. For example, it has been documented that moving subdirectories within the host’s source directory can give access from the container to the host’s file system. According to the patch review, a Rename function can be used to move a file or directory outside of a bind mount. This has allowed programs with paths below the renamed directory to traverse up their directory tree to the real root of the filesystem instead of just the root of their bind mount.

If you have sensitive data on the host’s file system to which you do not wish to provide any access, avoid using mount to host as much as possible. The next tip suggests a solution for sharing data without using mount to host. However if you don’t mind read-only access, you can limit the read-write access. Docker volumes default to mount in read-write mode, but you can also set it to be mounted read-only.

$ docker run -d -P --name web -v /src/webapp:/opt/webapp:ro training/webapp python app.py
The ro option indicates that the mount should be read-only.

Tip #3 — Define data/volume containers and mount these containers for sharing data

As mentioned in the previous tip, it is not recommended to use mount to host, as it can potentially provide access to the host’s file system. If you want to share data between containers and/or between the host, it’s best to create a named Data Volume Container, and then to mount the data from it. The named data volume container is an excellent method for sharing data between containers or to make this data available to non-persistent containers. All you have to do is create this data container and then mount the data from it.

The following example illustrates this process. While this container doesn’t run an application, it reuses the training/postgres image so that all containers are using layers in common, saving disk space.

$ docker create -v /dbdata –name dbstore training/postgres /bin/true

You can then use the –volumes-from flag to mount the /dbdata volume in another container.

$ docker run -d –volumes-from dbstore –name db1 training/postgres

And another:

$ docker run -d –volumes-from dbstore –name db2 training/postgres

In this case, if the postgres image contained a directory called /dbdata, then mounting the volumes from the dbstore container hides the /dbdata files from the postgres image. The result is that only the files from the dbstore container are visible.

You can use multiple –volumes-from parameters in one command to combine data volumes from multiple other dedicated data containers. Another idea is to mount the volumes from each subsequent container to the next, instead of the original dedicated container linking to new ones. You can also extend the chain by mounting the volume that came from the dbstorecontainer in yet another container via the db1 or db2 containers.

$ docker run -d –name db3 –volumes-from db1 training/postgres

If you remove containers that mount volumes, including the initial dbstore container, or the subsequent containers db1 and db2, the volumes will not be deleted. To delete the volume from disk, you must explicitly call docker rm -v against the last container with a reference to the volume. This allows you to upgrade, or effectively migrate data volumes between containers.

To fully delete a volume from the file-system you must run: ker rm -v <container name>

Tip #4 — avoid using container high-level permissions

Although by default, Docker containers are “unprivileged” and cannot, for example, run a Docker daemon inside a Docker container, some functions require high-level permissions and therefore it is common to see admins running containers using high-level permissions, such as –privilege.

A “privileged” container is given access to all devices. This means that when the administrator executes docker run –privileged, Docker will enable access to all devices on the host as well as set some configuration in AppArmor or SELinux. This allows the container nearly all the same access to the host as processes running outside containers on the host.

If you want to limit access to a specific device or devices you can use the –device flag. It allows you to specify one or more devices that will be accessible within the container.

By default, the container will be able to read, write, and mknod these devices. This can be overridden using a third :rwm set of options to each –device flag.

Tip #5 — define restrictions on resource consumption of the containers

Another mechanism that can be used to secure containers and the host environment is Control Groups, also known as cgroups. As mentioned above, cgroups limit the amount of resources that a certain container can use. These resources include CPU, memory, disk I/Os, and more. In this way we can prevent a rogue container from bringing the system down due to a bug or even due to a malicious code (e.g. DOS attack). This is of course particularly important on multi-tenant platforms, like public and private PaaS, to guarantee a consistent uptime (and performance) even when some applications start to misbehave.

Not all cgroup capabilities have been ported to libcontainer yet (or at least exposed to Docker). When using the LXC driver, you simply pass on LXC arguments directly. With libcontainer there are explicit Cgroup policy arguments exposed to Docker. You will need to explicitly set a driver when you launch the Docker daemon, so you cannot run the two drivers simultaneously.

Using the –-cgroup-parent flag, you can pass a specific cgroup to run a container in. This allows you to create and manage cgroups on their own. You can define custom resources for those cgroups and put containers under a common parent group.

In Libcontainer, the basic Cgroup policies are already exposed so if you, for example, want to lock down a Docker container to the first CPU core, you need to append –-cpuset-cpus=0 to your docker run command. In LXC driver for Docker, you will first need to enable it. The method for doing this will differ depending on your Linux distribution; however, the downside is that you may lose the ability to use a number of features. So unless you really need a functionality that isn’t yet exposed in Docker using libcontainer, you should use the default driver. If you have decided to use the LXC driver, all you need to do is to add the argument –lxc-conf and pass in the Cgroup policy that you’d like to set.

--

--