Using Capabilities
The Linux Kernel started introducing ‘capabilities’ all the way back in kernel 2.x. BSD has a similar (though different) concept using the Capsicum library.
What are capabilities?
Most processes on a Unix-like system run with the permissions of either a user account, or with root permissions. This is a relatively unsophisticated view of the world and divides permissions into two sets: a user is limited in what they can do and a superuser is unlimited in what they can do.
For users, this itself is not a good enough model, and indeed the sudoers config file can grant permissions to perform only certain actions to a user account. This delegation of certain aspects of system administration tasks, allows a root user to devolve particular regular tasks to a set of users, but restricts full superuser privileges.
So far, so standard Unix permission model.
However, capabilities take the idea of reducing privileges for a process to the next level, with the idea that the privileges can be broken down into smaller categories and a process should only need a small number of privileges. This hardens a process so that even if it is hacked, an attacker cannot do anything beyond what the process could do.
For example, a classic attack was to break into a web server process as this had to run as root (to bind to port 80) which meant the attacker would have free reign over the system. A web server process using capabilities can mitigate the worst effects of this attack.
Kernel capabilities
Taking advantage of capabilities involves reading the list of capabilities included in the kernel and deciding which of these your application needs:
At the moment there are 40 capabilities listed, some of these are quite fine-grained, some much broader in scope.
Capability Sets
To implement capabilities, the kernel specifies different ‘sets’:
typedef struct __user_cap_data_struct {
__u32 effective;
__u32 permitted;
__u32 inheritable;
} *cap_user_data_t;
These sets are effective
, permitted
and inheritable
. Capabilities from these sets can be applied to the binary (at the file level) or to the thread/process. In addition to these three sets, there is the bounding
set — the maximum capabilities a process is allowed to have and the ambient
set — the capabilities that are in effect currently for this process.
The capabilities man page has more detailed explanations of the roles these capability sets play with respect to processes and files.
Capability API
To interact with Linux capabilities requires the use of the <linux/capability.h>
header and the capset
function:
struct __user_cap_data_struct caps_to_set[2];
void set_caps(pid_t pid)
{
struct __user_cap_header_struct hs = {0};
hs.version = _LINUX_CAPABILITY_VERSION_3;
hs.pid = pid;
caps_to_set[CAP_TO_INDEX(CAP_BPF)].inheritable = |= CAP_TO_MASK(CAP_BPF);
caps_to_set[CAP_TO_INDEX(CAP_BPF)].effective = |= CAP_TO_MASK(CAP_BPF);
caps_to_set[CAP_TO_INDEX(CAP_BPF)].permitted = |= CAP_TO_MASK(CAP_BPF);
capset(&hs, caps_to_set);
}
This example uses capset
to give the process permissions to use privileged BPF operations.
Inspecting Capabilities
Most programs will not be “capability-aware”; however, those that are using capabilities to restrict the privileges that they use, can be inspected on the command line.
Linux comes with the command getpcaps
which allows you to check the capabilities of a running process. Here we can see that this process has been granted six different capabilities, including CAP_BPF:
Alternatively we can check the /proc/<process id>/status
for similar capability information. Here for the same process we can see the encoded versions of the six capabilities shown above:
~/projects ▓▒░ cat /proc/23815/status | grep Cap
CapInh: 000000c000803020
CapPrm: 000000c000803020
CapEff: 000000c000803020
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
We can use the capsh
utility to convert these values to the list of human-readable capabilities:
Setting Capabilities
Instead of rewriting programs to make calls to the capability api, it’s possible to use the setcap
utility to drop privileges from non-capability-aware programs:
# The following command give tcpdump the needed capabilities to sniff traffic
$ setcap cap_net_raw,cap_net_admin=eip /usr/sbin/tcpdump
$ getpcaps 9562
Capabilities for `9562': = cap_net_admin,cap_net_raw+ep
Systemd
Capabilities can also be manipulated within the context of systemd. A useful tool that comes as part of systemd is systemd-analyze
which is often used to verify that Unit files are correct. However if you pass the security
option:
systemd-analyze security
checks the registered services for how safe/exposed the service is. These services can then be modified (via their unit/service files) to reduce the capabilities that the service requires:
...
[Service]
ExecStart=...
CapabilityBoundingSet=CAP_CHOWN CAP_KILL
...
Modifying the CapabilityBoundingSet
to specify the exact capabilities the service requires can be a useful security/hardening tool.
Capabilities, containers & K8s
Much of our modern software is distributed as container images to be executed either directly by containerd
or to be scheduled via Kubernetes.
As you would expect, there is a mechanism to interact with linux capabilities in a useful fashion for containers. For example the flags --cap-add
and --cap-drop
allow you to specify explicit capabilities to add or drop when running a container using docker:
docker run -it --rm --cap-add=CAP_NET_ADMIN ubuntu:22.04
It’s more common to schedule containers via Kubernetes and as such there is support for setting capabilities that the container can use via the pod.yaml
deployment descriptors:
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo-4
spec:
containers:
- name: sec-ctx-4
image: gcr.io/google-samples/node-hello:1.0
securityContext:
capabilities:
add: ["NET_ADMIN", "SYS_TIME"]
This allows you to deploy containers into Kubernetes with explicit fine-grained capabilities. This is preferable to allowing a pod to run as root or to specify that the container is privileged (which grants wide-ranging access).