PyEBPF — eBPF proxy routines generation and Python callbacks (iovisor/bcc wrapper)

Hey folks !

A couple of weeks ago, I’ve stumbled upon Brendan Gregg’s BPF tracing tools diagram (shown below) in the context of a small read-up I’ve done on performance engineering.

Image for post
Image for post
BPF tracing tools taken from the iovisor/bcc repository

Looking at the diagram above, and shamefully realizing I’m not familiar with the vast majority of the tools within, I’ve decided to explore the available utilities alongside the infrastructure that enabled their creation.

And so, today I’m going to talk a bit about eBPF (extended-berkeley-packet-filter), bcc (BPF-compiler-collection) and how both of those could be used with Python more easily.

A quick intro to eBPF

You might be familiar with BPF filters as a tool of filtering packets, a common example would be using a BPF filter in tcpdump in order to filter incoming or outgoing network traffic.

eBPF (extended-BPF) is an enhancement to BPF, which allows one to trace and filter much more than just packets; For example, eBPF could be used in order to trace all SYSENTER operations to a specific syscall, say, open(2).

Code needs to be compiled to the BPF instruction set, which in turn, is loaded and ran by the kernel’s BPF virtual machine.

Using eBPF usually involves the following steps:

  1. You compile a source code to BPF byte-code (BPF is ran in a dedicated Virtual-Machine, and has its own instruction set) using a suitable compiler (e.g. LLVM)
  2. Your user mode program requests the kernel to store the compiled instructions, via the bpf(2) syscall with the BPF_PROG_LOAD command and a program type, which determines the type of our BPF module (e.g. BPF_PROG_TYPE_SOCKET_FILTER for packet filtering, BPF_PROG_TYPE_KPROBE for kernel probes)
  3. The kernel will now run a static-analyzer (bpf_check, source available here) and verify the control-flow-graph of the code is safe for use (The BPF VM is quite strict, most I/O operations are forbidden, infinite loops are detected, there could be up to BPF_MAXINSNS [2¹² under v.4.20], etc…)
  4. If the static checks above pass, the bpf(2) syscall results in a file-descriptor, which the user-mode program may operate against
  5. In addition to the above, one may create and use a predefined set of BPF data-structures in order to communicate between BPF routines, and between your routine and user-mode program
  6. Lastly, we may operate against our BPF module; For example, we may attach a kernel probe to it, so our routine would be called before or after a specific syscall is called

A simplified version of the process above could be depicted as follows:

Image for post
Image for post
Loading and operating with an eBPF kernel probe

A few words about BCC

When searching for eBPF related resources, a toolchain that kept popping up was BCC (The BPF-Compiler-Collection).

BCC is a very neat wrapper (with Python & Lua native bindings) for BPF code generation (using LLVM) and kernel metric collection.

When you work with BCC, you pass your source (as a file or buffer) to the library, which in turn enhances it, compiles it and wraps the relevant file-descriptors for use.

Looking at Python’s bcc library, there are a few things to note:

  1. The bcc library exposes a BPF Python object that operates against a native library — libbcc
  2. In turn, libbcc uses libbpf and bpf_module, where bpf_module deals with code generation via LLVM and libbpf handles program loading, program-attachment and BPF data-structure management
  3. Code is generated via LLVM on the fly, and your eBPF routines are usually written as a string in Python, and passed to BCC (and internally to LLVM) for compilation

BCC has quite an extensive code base with various convenient wrappers and lots of useful c-macros for generating and using BPF data structures.

Let’s look at an example that traces all bind(2) syscalls, and prints the pid, process name and socket-fd of the process which called bind()

from threading import Thread
from socket import socket
import ctypes as ct
import time
from bcc import BPF
prog = '''
#include <linux/sched.h>/* This is a BCC macro for that creates a BPF data-structure that we can communicate with in user-mode */BPF_PERF_OUTPUT(events);// The following is the data-structure we'll pass to our user-land program
struct data_t {
char process_name[TASK_COMM_LEN]; // Process name
u32 pid; // Process ID
int socket_fd; // Bound Socket FD
};
/* This is our BPF routine, it contains two arguments:- A pt_regs* struct, which contains the BPF VM registers- A socket fd - this will actually be transformed by bcc to a local variable that is set by the registers, see note below
*/
int on_bind(struct pt_regs* ctx, int sockfd) {
struct data_t data = {};
// A bpf helper that gets the process name that invoked the bind operation
bpf_get_current_comm(&data.process_name, sizeof(data.process_name));

// Gets the pid via the bpf helper (pid is the upper 32 bits)
data.pid = (u32) (bpf_get_current_pid_tgid() >> 32);
data.socket_fd = sockfd;
// Copies the data to the BPF structure, it is now available to user-mode
events.perf_submit(ctx, &data, sizeof(data));
return 0;
}
'''
# Compiles the BPF program via LLVM
b = BPF(text=prog)
# Represents the native data-structure above
class Data(ct.Structure):
_fields_ = [
('process_name', ct.c_char * 16),
('pid', ct.c_uint32),
('socket_fd', ct.c_int32)
]
# Prints header
print 'COMM PID SOCKETFD'
# A callback to be called for every record in the 'events' BPF data structure
def print_event(cpu, data, size):
data = ct.cast(data, ct.POINTER(Data)).contents
print '{process_name} {pid} {socket_fd}'.format(process_name=data.process_name, pid=data.pid, socket_fd=data.socket_fd)# This calls libbpf, which in turns calls the bpf(2) syscall, and does a few more tricks to attach the kernel probe
b.attach_kprobe(event=b.get_syscall_fnname('bind'), fn_name='on_bind')
# An async function that binds to localhost:31337 (To get an output for the above)
def call_bind_async():
time.sleep(2)
print 'Calling bind...'
s = socket()
s.bind(('localhost', 31337))
t = Thread(target=call_bind_async)
t.start()
# This will open the BPF data structure for polling
b['events'].open_perf_buffer(print_event)
while True:
try:
# Poll the data structure till Ctrl+C
b.perf_buffer_poll()
except KeyboardInterrupt:
print 'Bye !'
break

Running the example above yields the following output:

Image for post
Image for post
Output from trace_bind.py

To reveal some of the pre-processing BCC brings to table, we can pass the debug=DEBUG_PREPROCESSOR flag to BPF’s constructor.

As we can see, there are a few interesting things to note here:

  1. BCC loads an in-memory filesystem mapped to the /virtual directory
  2. We’ve got additional #ifdefs at the top, preventing us from setting the BPF_LICENSE macro and conditionally defining the CONFIG_CC_STACKPROTECTOR macro
  3. The BPF_PERF_OUTPUT macro was not expanded, but can be observed here
  4. Our routine has been annotated to be put in its own section (This is not mandatory for loading BPF routines, but is used later by bcc itself)
  5. Our extra parameter was stripped from our method, and it now resides on stack and copied from the di register
  6. Our use of events was replaced with a call to bpf_perf_event_output that uses bpf_pseudo_fd (Note that fd 3 is the first fd to be created after stdin, out, err)
  7. There is an additional footer included — this footer defines the BPF_LICENSE macro as “GPL” (relevant code) [Note that BCC’s license is actually Apache v2.0, this is only a parameters that is later passed on to the bpf(2) syscall]

Now that we’ve dipped our toe in BCC’s waters, let’s continue in explaining what is PyEBPF and why it was created.

PyEBPF — Yet Another Wrapper

One aspect that felt a bit daunting when working with BCC’s python library was the fact that even with supposedly trivial examples (Like trace_bind above), there had to be written some boilerplate that deals with sharing data between our routine and our user-mode application.

I thought it would be fun, for educational purposes, to try and automate the process a bit, so one could have the ability to write simple tracing routines, without writing extra native code.

PyEBPF provides a simple wrapper that helps you attach kernel probes that are attached to any syscall without writing a single line of native code.

It does so by a few key steps:

  1. For a given kprobe_attach request, it inspects the syscall’s arguments, by trying to parse the /sys/kernel/debug/tracing/events/syscalls/sys_enter_<syscall>/format file
  2. Then, it generates an appropriate BPF routine and inject the syscall arguments to it
  3. A native data structure is generated, it will hold the syscall arguments, along with the pid, tid, current time in ns, uid, gid and invoking process name
  4. A ctypes data structure that represents the native one is generated as well
  5. A BPF map will be injected as well, to copy the information from our routine back to user mode
  6. Finally, the wrapper will spawn a daemon thread that polls on our BPF map, casts the data object to our ctypes structure, and invokes a user-passed python function with it

Thus, our trace_bind example above, transforms to the following:

from threading import Thread
from socket import socket
import time
from pyebpf.ebpf_wrapper import EBPFWrapper# Note that there is no native code passing here
b = EBPFWrapper()
# Prints header
print 'COMM PID SOCKETFD'
# A callback to be called by pyebpf
def on_bind(cpu, data, size):
print '{process_name} {pid} {socket_fd}'.format(process_name=data.process_name, pid=data.process_id, socket_fd=data.fd)
# Note that we pass a function object
b.attach_kprobe(event=b.get_syscall_fnname('bind'), fn=on_bind)

# An async function that binds to localhost:31337 (To get an output for the above)
def call_bind_async():
time.sleep(2)
print 'Calling bind...'
s = socket()
s.bind(('localhost', 31337))
t = Thread(target=call_bind_async)
t.start()
while True:
try:
time.sleep(1)
except KeyboardInterrupt:
print 'Bye !'
break

Note how in the example above, we’ve basically halved the amount of code we needed to write.

Conclusion

PyEBPF is an tool written for educational purposes to lessen the burden of writing simple BPF routines.

Complex examples may, more often than not, use helpers and routines that are only available in kernel mode — and it would most definitely not fit those purposes.

However, if you’re just starting around with tracing syscalls, and you don’t mind having all of the control flow in user-mode, feel free to use and improve the library further (For example, it does not cover USDT’s, kretprobes, tracepoints, and most other very-cool functionality eBPF offers).

PyEBPF is available under an MIT license, you can find it’s source code here and you may install it via pip:

$> pip install pyebpf

Note: You need to install BCC separately, please refer to this guide if you haven’t done so already.

Hope you enjoyed the reading !

Written by

Coder, C++ enthusiast, Love working on cool stuff in my free time

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store