Defining Privileges and Access Control Settings for Pods and Containers in Kubernetes
--
In the recent tutorial, we discussed Secrets API designed to encode sensitive data and expose it to pods in a controlled way, enabling secrets encapsulation and sharing between containers.
However, Secrets are only one component of the pod- and container-level security in Kubernetes. Another important dimension is a security context that facilitates management of access rights, privileges, and permissions for processes and filesystems in Kubernetes.
In this tutorial, we’ll discuss how to set up access rights and privileges for container processes within a pod using discretionary access control (DAC) and ensuring proper isolation of container processes from the host using Linux capabilities. By the end of this tutorial, you’ll know how to limit the ability of containers to negatively impact your infrastructure and other containers and limit access of users to sensitive data and mission-critical programs in your Kubernetes environment. Let’s get started!
Defining Security Context
A security context can be defined as a set of constraints applied to a container in order to achieve the following goals:
- Enable a distinct isolation between a container and the host/node it runs on. Many users of containers underestimate this task and think that containers are properly isolated from hosts like virtual machines (VMs). The reality is different though. Privileged processes (e.g., running as root) running in the container are identical to privileged processes that run on the host. Therefore, running an application in the container does not isolate it from the host. Running containers as root can cause serious problems if Docker images from untrusted sources are used.
- Prevent containers from negatively impacting the infrastructure or other containers.
These basic goals necessitate the following best practices for using security contexts in Kubernetes:
- Drop process privileges in containers as quickly as possible or be aware of them.
- Run services as non-root whenever possible.
- Don’t use random Docker images in your system.
Security contexts in Kubernetes facilitate implementation of this task and help protect your system against various security risks. We’ll discuss below how to achieve the goals outlined above by using PodSecurityContext
and SecurityContext
in your pods and containers.
Tutorial
To complete examples in this tutorial, you’ll need:
- A running Kubernetes cluster. See Supergiant documentation for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube.
- A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.
Using Security Contexts in Pods and Containers
Security context settings implement basic philosophy of discretionary access control (DAC). This is a type of access control in which a given user has complete control over all programs it owns and executes. This user can also determine the permissions of other users for accessing and modifying these files or programs. DAC contrasts with mandatory access control (MAC) by which the operating system (OS) constraints the ability of a subject (e.g., process) or initiator to access or perform some operations on computing objects (e.g., files),
In Kubernetes, using DAC implies that you, as a user or administrator, can set access and permission constraints on files and processes running in your pods and containers. Security contexts can be specified for the entire pods and/or for individual containers.
Let’s first start with the pod-level security context. To specify security settings for a pod, you need to include the securityContext
field in the pod manifest. This field is a PodSecurityContext
object that saves security context in the Kubernetes API. Let’s create a pod with a security context using the example below. This is a pod that runs a simple Node.js application that we wrote and saved in the public Docker Hub repository.
apiVersion: v1
kind: Pod
metadata:
name: security-context-pod
spec:
securityContext:
runAsUser: 2500
fsGroup: 2000
volumes:
- name: security-context-vol
emptyDir: {}
containers:
- name: security-context-cont
image: supergiantkir/k8s-liveliness
volumeMounts:
- name: security-context-vol
mountPath: /data/test
securityContext:
allowPrivilegeEscalation: false
As you can see, we have two security contexts in this pod. The first one is a pod-level security context defined by the PodSecurityContext
object, and the second one is a SecurityContext
defined for the individual container. Pod-level security context works for all individual containers in the pod, but, field values of container.securityContext
take precedence over field values of PodSecurityContext
. In other words, if the container-level security context is defined, it overrides the pod-level security context.
You now have a basic understanding of how security contexts work, so let’s discuss key settings available for the PodSecurityContext
:
.spec.securityContext.runAsUser
— This field specifies the User ID (UID) with which to run the Entrypoint (default executable of the image) of the container process. If the field value is not specified, it defaults to the UID defined in the image metadata. The discussed field can be also used in the spec.containers[].securityContext
, in which case it takes precedence over the same field in the PodSecurityContext
. In our example, the field specifies that for any containers in the pod, the container process runs with user ID 2500
.
.spec.securityContext.fsGroup
— The field defines a special supplemental group that assigns a group ID (GID) for all containers in the pod. Also, this group ID is associated with the emptyDir
volume mounted at /data/test
and with any files created in that volume. You should remember that only certain volume types allow the kubelet to change the ownership of a volume to be owned by the pod. If the volume type allows this (as emptyDir
volume type) the owning GID will be the fsGroup
.
.spec.securityContext.runAsGroup
— This field is useful in cases when you want to run the entrypoint of the container process by a group rather than a user. In this case, you can specify a GID for that group using this field. If the field is not set, the image default will be used. If the field is set both in SecurityContext
and PodSecurityContext
, the value specified in the container’s SecurityContext
takes precedence over the one specified in the PodSecurityContext
.
.spec.securityContext.runAsNonRoot
— The field determines whether the pod’s container should run as a non-root user. If set to true, the kubelet will validate the image at runtime to make sure that it does not run as UID 0
(root) and won’t start the container if it does. If set in both SecurityContext
and PodSecurityContext
, the value specified in SecurityContext
takes precedence. The discussed field is very important for preventing privileged processes in containers from accessing the system and the host.
Now, as you understand key options for PodSecurityContext
, save the spec above in security-context-demo.yaml
and create the Pod:
kubectl create -f security-context-demo.yaml
pod “security-context-pod” created
Now, verify that the pod is running:
kubectl get pod security-context-podNAME READY STATUS RESTARTS AGEsecurity-context-pod 1/1 Running 0 16s
Next, we will check the ownership of processes run within the Node.js container. First, get a shell to the running container:
kubectl exec -it security-context-pod -- /bin/bash
Inside the container, list all running processes:
ps auxUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND2500 1 0.7 2.0 983564 41352 ? Ssl 11:24 0:00 npm
2500 16 0.0 0.0 4340 736 ? S 11:24 0:00 sh -c node serv
2500 17 0.4 1.7 882368 35848 ? Sl 11:24 0:00 node server.js
2500 23 0.0 0.1 20252 3252 pts/0 Ss 11:24 0:00 /bin/bash
2500 28 0.0 0.1 17500 2056 pts/0 R+ 11:24 0:00 ps aux
Awesome! The output above shows that all processes in the container are run by the UID 2500
as we expected.
Remember that we set the GID for all containers and volumes in our Pod? Let’s check how it worked out. Go to the /data
directory in the container’s filesystem root and list the permissions of the /test
directory inside it:
You should see something like this:
drwxrwsrwx 2 root 2000 4096 Jul 19 11:23 test
The output shows that the /data/demo
directory has group ID 2000
, which is the value of fsGroup
.
Hypothetically, all new files and directories will also receive the GID defined by the fsGroup
. Let’s check if this is true:
cd test
echo This file has the same GID as the parent directory > demofile
Now, check the file’s ownership:
ls -l
-rw-r--r-- 1 2500 2000 51 Jul 19 11:30 demofile
As you see, the demofile has a group ID 2000
, which is the value of fsGroup
. As simple as that!
Overriding Pod Security Context in the Container
As we’ve already mentioned, a container’s SecurityContext
takes precedence over the PodSecurityContext
. Therefore, you can set a pod-level security context for all containers in the pod and override it if needed by modifying a SecurityContext
for individual containers. Let’s create a new pod to see how this works:
apiVersion: v1
kind: Pod
metadata:
name: override-security-demo
spec:
securityContext:
runAsUser: 3000
containers:
- name: override-security-cont
image: supergiantkir/k8s-liveliness
securityContext:
runAsUser: 2000
allowPrivilegeEscalation: false
This pod runs the container with the same Docker image as in the example above, but this time UID to run the process with is specified both for the pod and the container inside it.
Before creating this Pod, let’s discuss key options available in the container’s SecurityContext
:
.spec.containers[]securityContext.runAsUser
— The same as in the PodSecurityContext
.spec.containers[]securityContext.runAsGroup
— The same as in the PodSecurityContext
.spec.containers[]securityContext.runAsNonRoot
— The same as in the PodSecurityContext
.spec.containers[].securityContext.allowPrivilegeEscalation
— This field controls whether a process can get more privileges than its parent process. More specifically, it controls whether the no_new_privs
flag will be set on the container process. AllowPrivilegeEscalation
is always true when the container is: (1) run as Privileged (2) has a CAP_SYS_ADMIN
Linux capability enabled.
.spec.containers[].securityContext.privileged
— The field tells kubelet to run the container in the privileged mode. Processes in privileged containers are essentially identical to root processes on the host. The default value is false
.
.spec.containers[].securityContext.readOnlyRootFilesystem
— Defines whether a container has a read-only root filesystem. The default value is false
.
.spec.containers[].securityContext.seLinuxOptions
— The SELinux context to be applied to the container. If the value is unspecified, the container runtime (e.g., Docker) will assign a random SELinux context for each container in a pod. If the value is set in both SecurityContext
and PodSecurityContext
, the value specified in SecurityContext
takes precedence.
Now, save this spec in the override-security-demo.yaml
and create the pod running the following command:
kubectl create -f override-security-demo.yaml
pod “override-security-demo” created
Next, verify that the pod is running:
kubectl get pod override-security-demoNAME READY STATUS RESTARTS AGEoverride-security-demo 1/1 Running 0 45s
Then, as in the first example, get a shell to the running container to check the ownership of container processes:
kubectl exec -it override-security-demo -- /bin/bash
Inside the container, show the list of the running processes:
ps auxUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND2000 1 0.2 2.0 983532 40992 ? Ssl 10:14 0:00 npm
2000 16 0.0 0.0 4340 800 ? S 10:14 0:00 sh -c node serv
2000 17 0.1 1.7 883392 36084 ? Sl 10:14 0:00 node server.js
2000 23 0.0 0.1 20252 3232 pts/0 Ss 10:16 0:00 /bin/bash
2000 28 0.0 0.1 17500 2060 pts/0 R+ 10:16 0:00 ps aux
As you see, all the processes are run with the UID 2000
which is the value of runAsUser
specified for the Container. It overrides the UID value 3000 specified for the pod.
Using Linux Capabilities
If you want a fine-grained control over process privileges, you can use Linux capabilities. To understand how they work, we need a basic introduction to the Unix/Linux processes. In a nutshell, traditional Unix implementations have two classes of processes: (1) privileged processes (whose user ID is 0
, referred to as root or as superuser) and (2) unprivileged processes (that have a non-zero UID).
In contrast to privileged processes that bypass all kernel permission checks, unprivileged processes have to pass full permission checking based on the process’s credentials such as effective UID, GID, and supplementary group list. Starting with kernel 2.2, Linux has divided privileged processes’ privileges into distinct units, known as capabilities. These distinct units/privileges can be independently assigned and enabled for unprivileged processes introducing root privileges to them. Kubernetes users can use Linux capabilities to grant certain privileges to a process without giving it all privileges of the root user. This is helpful for improving container isolation from the host since containers no longer need to write as root — you can just grant certain root privileges to them and that’s it.
To add or remove Linux capabilities for a container, you can include the capabilities field in the securityContext
section of the container manifest. Let’s see an example:
apiVersion: v1
kind: Pod
metadata:
name: linux-cpb-demo
spec:
securityContext:
runAsUser: 3000
containers:
- name: linux-cpb-cont
image: supergiantkir/k8s-liveliness
securityContext:
capabilities:
add: ["NET_ADMIN"]
In this example, we assigned a CAP_NET_ADMIN
capability to the container. This Linux capability allows a process to perform various network-related operations such as interface configuration, administration of IP firewall, modifying routing tables, enabling multicasting, etc. For the full list of available capabilities, see the official Linux documentation.
Note: Linux capabilities have the form CAP_XXX
. However, when you list capabilities in your Container manifest, you must omit the CAP_
part of the constant. For example, to add CAP_SYS_TIME
capability, include SYS_TIME
in your list of capabilities.
Cleaning Up
As this tutorial is over, let’s clean after ourselves.
Don’t forget to delete all pods:
kubectl delete pod security-context-pod
pod “security-context-pod” deletedkubectl delete pod override-security-demo
pod “override-security-demo” deletedkubectl delete pod linux-cpb-demo
pod “linux-cpb-demo” deleted
Also, you may wish to delete all files with the pod manifests if you don’t need them anymore.
Conclusion
In this article, we have discussed how to use Kubernetes security contexts in your pods and containers. Security contexts are a powerful tool for controlling access rights and privileges of processes running in the pod’s containers. Kubernetes allows setting a pod-level security context for all containers and overriding it by the individual containers using SecurityContext
manifest.
Kubernetes security contexts are also helpful if you want to isolate container processes from the host. In particular, you learned how to use Linux capabilities to grant certain root privileges to processes allowing them to run as non-root while giving them root privileges necessary for them to work. All these features make Kubernetes security context a powerful addition to Kubernetes secrets that allow improving the security of your Kubernetes application and proper isolation of container environment from other users and underlying nodes.
Originally published at supergiant.io.