Master the Container Security — Security Aspect

6 min readJun 22, 2022

If you are new to the containers, you can read my previous blog to get a glimpse of the container’s basics.

Organizations are shifting from a monolithic architecture to microservices to containers. While deploying a microservice, it is very important to keep in mind that the architecture of the machine in production should be the same as where you are developing your application because of dependencies and architecture issues. To avoid these issues containers come into the picture. Hence many organizations are using containers to deploy their production application which are dealing with databases, AWS S3 buckets, etc. It is very important to keep containers secure, It can be achieved by using various techniques which are:

Control Groups (CGroups)
Docker Rootless
Docker Namespace
Kernel Capabilities
AppArmor
Seccomp profiles

Let's get started …

Control Groups(CGroups)

By default, a container has no resource constraints and can use as much of a given resource. CGroups implement resource accounting and limiting. They provide various helpful metrics and ensure that each container will get its fair share of the resources(CPU, Memory, I/O, etc) so that, a single container cannot bring down the whole system by exhausting all the resources.

They are the Linux capabilities so it is important for your kernel to support these capabilities. You can confirm whether the kernel supports it or not by running the following command:

docker info

If you see the output given below in the image it means it is disabled.

You can use -m (or --memory) and --cpus to restrict the memory and number of cpus. To know more, read this.

Docker Rootless

Running container as a non-root user

You can create a group and a user and include a username in the dockerfile. It is important because even if an attacker gets hold of a container he won’t be able to run commands as root. Remember you are running container as non-root but there are chances that docker daemon might be running as root.

RUN groupadd -g 1000 someUser && useradd -u 1000 -g 1000\
-d /usr/share/someUser -M someUser
USER someUser

Running docker daemon as a non-root user (Rootless mode)

Rootless mode allows you to run docker daemon and container as a non-root user. It uses user namespace to run both docker daemon and docker container. The rootless mode does not use binaries with SETUID bits or file capabilities, except newuidmap and newgidmap, which are needed to allow multiple UIDs/GIDs to be used in the user namespace. To know more, read this.

Docker Namespace

The concept of a namespace is the same here as it is used in other contexts. So here you can achieve more isolation since containers in one namespace won’t be able to access the containers in other namespaces. In case of a breach in one namespace, if configured properly attacker won’t be able to access the container in other namespaces. In the Docker namespace, the docker daemon is running as root and containers as non-root. The process running inside the containers assumes that is running as root within the container and it happens because of user remapping within a namespace.

The remapping itself is handled by two files: /etc/subuid and /etc/subgid. Each file works the same, but one is concerned with the user ID range, and the other with the group ID range. Consider the following entry in /etc/subuid:

testuser:231072:65536

This means that testuser is assigned a subordinate user ID range of 231072 and the next 65536 integers in sequence. UID 231072 is mapped within the namespace (within the container, in this case) as UID 0 (root). UID 231073 is mapped as UID 1, and so forth. If a process attempts to escalate privilege outside of the namespace, the process is running as an unprivileged high-number UID on the host, which does not even map to a real user. This means the process has no privileges on the host system at all. To know more, read this.

Kernel Capabilities

By default, docker restricts a lot of kernel capabilities within a container. For example, processes(like web servers) that just need to bind on a port below 1024 do not need to run as root: they can just be granted the net_bind_service capability instead. And there are many other capabilities, for almost all the specific areas where root privileges are usually needed. We can deny these capabilities so even if an intruder manages to escalate to root within a container, it is much harder to do serious damage or to escalate to the host.

AppArmor

AppArmor is a Mandatory Access Control(MAC) system that confines a program to a limited set of resources. We can create a security profile that can be loaded in the kernel and AppArmor can be set to either enforce the profile or complain when profile rules are violated.

Difference between DAC (Discretionary access control)(LINUX, UNIX etc) and MAC(Mandatory Access Control)(AppArmor, SELinux etc)
With DAC, files and processes have owners. You can have the user own a file, a group own a file, or other, which can be anyone else. Users have the ability to change permissions on their own files. The root user has full access control with a DAC system. If you have root access, then you can access any other user’s files or do whatever you want on the system.
But on MAC systems like SELinux, there is administratively set policy around access. Even if the DAC settings on your home directory are changed, an SELinux policy in place to prevent another user or process from accessing the directory will keep the system safe.

AppArmor profiles are simple text files. AppArmor profiles are default-deny by default, which means if you have not defined something in the policy file it means it is denied you cannot execute it inside the container. I have attached a sample profile for Nginx. You can run the Nginx container and load this profile and when you will try commands like ping, touch, sh, dash, etc it will not work as deny is set in the profile. To know more, read this.

Seccomp profiles

Secure computing mode (seccomp) is a Linux kernel feature. Like AppArmor, it is also used to restrict the actions available within the container. We create seccomp profiles and pass them to the container using --security-opt option with docker run command. Seccomp is used to block syscall. This feature is available only if Docker has been built seccomp and the kernel is configured with CONFIG_SECCOMP enabled. To check if your kernel supports seccomp run the below-given command:

Like AppArmor seccomp profile is an allowlist that denies access to system calls by default, then allowlists specific system calls. To know more, read this.

Few extra key points …

Do not expose /var/run/docker.sock file to the container. This socket file is used for container management like collecting the logs, creating containers, stopping them, running commands on it, etc.
Volume mounts as read-only.
Ensure the host network interface is not shared with the containers.
Do not mount system volumes like /etc, /dev etc.

Thank you for reading and Keep following for the next blog in the Container series.
You can always reach out to me if you have any questions/queries.
If you liked it follow me on Twitter: @d3afh3av3n.