Spy on your Kubernetes cluster with BPF

If you’ve been using Kubernetes for some time, you’ll know that the best thing you can do is to invite hundreds of thousands of your best acquaintances to run arbitrary commands on your cluster without adult supervision. At some point in the future, you might wonder what those people that you once knew are still doing on your cluster.

BPF (Berkeley Packet Filter) is a virtual machine inside the Linux Kernel that classifies events and triggers actions when it receives one of those events. It allows you to inject code into the kernel at runtime to handle those events; no kernel compilation required. There are two flavors of BPF. This article refers to the extended version, eBPF, but I just call it BPF.

So, getting back to those pesky friends hogging all the resources in your cluster. One of your options is to check all their pod definitions to determine the entry point for all their containers. This will only give you the first program that the container started. You’ll never know if their entry point spawned a million other processes that are actually mining the crypto currency of the week.

Another more interesting option is to use Kubectl Trace to check what’s running in a container. This extension for Kubectl schedules one-time-only jobs to run BPF programs inside a container. So you can list all processes running in any container with a one line command:

kubectl trace run container -e \
"tracepoint:syscalls:sys_enter_execve { @[comm] = count() }"

This command will wait for containers to execute new processes and will group as many of them that are running by their command name (comm). The problem is that you need to execute this command when you want to know what’s happening in there; it doesn’t allow you to supervise the cluster usage without you looking at it all the time.

My favorite option is to deploy BPF’s Execsnoop as a sidecar container in each pod and let it log in real time all the processes running in the pod. Kubernetes 1.13 has a configuration flag called shareProcessNamespace that allows you to put all the processes spawned in the pod into the same namespace, so you can spy on all the containers within the pod from your sidecar. This is the beginning of your pod definition to do that:

apiVersion: v1
kind: Pod
name: happy-borg
shareProcessNamespace: true
- name: execsnoop
image: calavera/execsnoop
- privileged: true
- name: sys # mount the debug filesystem
mountPath: /sys
readOnly: true
- name: headers # mount the kernel headers required by bcc
mountPath: /usr/src
readOnly: true
- name: modules # mount the kernel modules required by bcc
mountPath: /lib/modules
readOnly: true
- name: container doing random work

BPF opens the doors to better observability for Kubernetes. If you’re a Kubecon, don’t miss Lorenzo’s talk on Kubernetes Performance Analysis.

Jessie and I are writing a book about BPF to teach people how to use it without having to become a kernel hacker. Follow us on Twitter to get updates. We just started, and we’re very excited to get it into your hands as soon as possible.