Using Capabilities

Kev Jackson
THG Tech Blog
Published in
5 min readSep 14, 2023

The Linux Kernel started introducing ‘capabilities’ all the way back in kernel 2.x. BSD has a similar (though different) concept using the Capsicum library.

libcapsicum — https://www.cl.cam.ac.uk/research/security/capsicum/

What are capabilities?

Most processes on a Unix-like system run with the permissions of either a user account, or with root permissions. This is a relatively unsophisticated view of the world and divides permissions into two sets: a user is limited in what they can do and a superuser is unlimited in what they can do.

For users, this itself is not a good enough model, and indeed the sudoers config file can grant permissions to perform only certain actions to a user account. This delegation of certain aspects of system administration tasks, allows a root user to devolve particular regular tasks to a set of users, but restricts full superuser privileges.

So far, so standard Unix permission model.

However, capabilities take the idea of reducing privileges for a process to the next level, with the idea that the privileges can be broken down into smaller categories and a process should only need a small number of privileges. This hardens a process so that even if it is hacked, an attacker cannot do anything beyond what the process could do.

For example, a classic attack was to break into a web server process as this had to run as root (to bind to port 80) which meant the attacker would have free reign over the system. A web server process using capabilities can mitigate the worst effects of this attack.

Kernel capabilities

Taking advantage of capabilities involves reading the list of capabilities included in the kernel and deciding which of these your application needs:

https://github.com/torvalds/linux/blob/master/include/uapi/linux/capability.h

At the moment there are 40 capabilities listed, some of these are quite fine-grained, some much broader in scope.

Capability Sets

To implement capabilities, the kernel specifies different ‘sets’:

typedef struct __user_cap_data_struct {
__u32 effective;
__u32 permitted;
__u32 inheritable;
} *cap_user_data_t;

These sets are effective , permitted and inheritable . Capabilities from these sets can be applied to the binary (at the file level) or to the thread/process. In addition to these three sets, there is the bounding set — the maximum capabilities a process is allowed to have and the ambient set — the capabilities that are in effect currently for this process.

The capabilities man page has more detailed explanations of the roles these capability sets play with respect to processes and files.

Capability API

To interact with Linux capabilities requires the use of the <linux/capability.h> header and the capset function:

struct __user_cap_data_struct caps_to_set[2];

void set_caps(pid_t pid)
{
struct __user_cap_header_struct hs = {0};
hs.version = _LINUX_CAPABILITY_VERSION_3;
hs.pid = pid;

caps_to_set[CAP_TO_INDEX(CAP_BPF)].inheritable = |= CAP_TO_MASK(CAP_BPF);
caps_to_set[CAP_TO_INDEX(CAP_BPF)].effective = |= CAP_TO_MASK(CAP_BPF);
caps_to_set[CAP_TO_INDEX(CAP_BPF)].permitted = |= CAP_TO_MASK(CAP_BPF);

capset(&hs, caps_to_set);
}

This example uses capset to give the process permissions to use privileged BPF operations.

Inspecting Capabilities

Most programs will not be “capability-aware”; however, those that are using capabilities to restrict the privileges that they use, can be inspected on the command line.

Linux comes with the command getpcaps which allows you to check the capabilities of a running process. Here we can see that this process has been granted six different capabilities, including CAP_BPF:

Getting the capabilities of a running process with getpcaps

Alternatively we can check the /proc/<process id>/status for similar capability information. Here for the same process we can see the encoded versions of the six capabilities shown above:

~/projects ▓▒░ cat /proc/23815/status | grep Cap
CapInh: 000000c000803020
CapPrm: 000000c000803020
CapEff: 000000c000803020
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000

We can use the capsh utility to convert these values to the list of human-readable capabilities:

decoding capabilities

Setting Capabilities

Instead of rewriting programs to make calls to the capability api, it’s possible to use the setcap utility to drop privileges from non-capability-aware programs:

# The following command give tcpdump the needed capabilities to sniff traffic
$ setcap cap_net_raw,cap_net_admin=eip /usr/sbin/tcpdump
$ getpcaps 9562
Capabilities for `9562': = cap_net_admin,cap_net_raw+ep

Systemd

Capabilities can also be manipulated within the context of systemd. A useful tool that comes as part of systemd is systemd-analyze which is often used to verify that Unit files are correct. However if you pass the security option:

A selection of systemd services

systemd-analyze security checks the registered services for how safe/exposed the service is. These services can then be modified (via their unit/service files) to reduce the capabilities that the service requires:

...
[Service]
ExecStart=...
CapabilityBoundingSet=CAP_CHOWN CAP_KILL
...

Modifying the CapabilityBoundingSet to specify the exact capabilities the service requires can be a useful security/hardening tool.

Capabilities, containers & K8s

Much of our modern software is distributed as container images to be executed either directly by containerd or to be scheduled via Kubernetes.

As you would expect, there is a mechanism to interact with linux capabilities in a useful fashion for containers. For example the flags --cap-add and --cap-drop allow you to specify explicit capabilities to add or drop when running a container using docker:

docker run -it --rm --cap-add=CAP_NET_ADMIN ubuntu:22.04

It’s more common to schedule containers via Kubernetes and as such there is support for setting capabilities that the container can use via the pod.yaml deployment descriptors:

apiVersion: v1
kind: Pod
metadata:
name: security-context-demo-4
spec:
containers:
- name: sec-ctx-4
image: gcr.io/google-samples/node-hello:1.0
securityContext:
capabilities:
add: ["NET_ADMIN", "SYS_TIME"]

This allows you to deploy containers into Kubernetes with explicit fine-grained capabilities. This is preferable to allowing a pod to run as root or to specify that the container is privileged (which grants wide-ranging access).

--

--

Kev Jackson
THG Tech Blog

Principal Software Engineer @ THG, We’re recruiting — thg.com/careers