When Mac M1/M2 Met eBPF: A Tale of Compatibility
When Mac M1/M2 Met eBPF: A Tale of Compatibility
Find me on LinkedIn: https://www.linkedin.com/in/harry-touloupas/
Yes it’s a pain I know!
So you’re intrigued by eBPF, enthralled by its capabilities, and eager to master it. You even have Liz Rice’s fascinating book, “Learning eBPF: Programming the Linux Kernel for Enhanced Observability, Networking, and Security,” on your reading list. But there’s one hitch — you can’t run the exercises on your M1/M2 Macbook Pro, because the macOS’s underlying system, Darwin, doesn’t support eBPF out of the box.
That’s where this guide comes in. By the end of this journey, you’ll have a working setup to play with eBPF on your Mac M1 or M2, without tearing your hair out. Let’s dive in.
The eBPF Primer
eBPF (Extended Berkeley Packet Filter) is a technology for running sandboxed programs in the Linux kernel without changing kernel source code or loading kernel modules. It has been integrated into the Linux kernel and thus works out of the box on most modern Linux distributions. The kernel is such a complex system on its own, so changing a piece of the codebase requires some familiarity with the existing code. Even dynamically loaded kernel modules (DLKM) which can be added to the running system without rebooting it or rebuilding the kernel, pose certain difficulties for developers. eBPF offers the necessary abstraction for developers to be able to add functionality to the kernel in order to:
- Perform performance tracing of any aspect of the system
- Offer observability on multiple system components or user space programs
- Detecting and (maybe) preventing malicious activity
eBPF runs on its own virtual machine (VM) within the Linux kernel for safety and efficiency reasons. Here’s how it works:
- Loading an eBPF program: A user-level application writes an eBPF program and loads it into the kernel using a system call. The eBPF program is represented in bytecode form, which is a platform-independent, low-level format that the eBPF VM can interpret.
- Verification: The kernel includes an eBPF verifier which checks the bytecode for any potential unsafe operations. It ensures the program doesn’t have loops (to avoid infinite loops), doesn’t access invalid memory, and adheres to other safety constraints. If the program doesn’t pass these checks, the kernel refuses to load it.
- Just-In-Time (JIT) Compilation: After verification, the bytecode can be translated (JIT compiled) into the machine code of the host system. This makes eBPF program execution as efficient as running native compiled code.
- Running the program: Once loaded and compiled, the eBPF program is executed in response to various events, like system calls, network packets, tracepoints, etc. The program can read or write to eBPF maps (key-value data structures in kernel space), call a set of predefined helper functions, and make decisions that affect system behavior (like whether to allow a certain system call or network packet).
- Interaction with user-space: The user-space application that loaded the eBPF program can interact with it by reading or writing to its maps. This is how programs get input data into eBPF and get output data back out.
Why doesn’t it run on Darwin?
As previously mentioned macOS runs on a different kind of kernel called the XNU which does not support eBPF out of the box. So you’d think that setting up a VM using for example VirtualBox would be the simplest way to go. After a lot of trial and error, I discovered that due to the ARM architecture of the newly release M(X) (X == 1 | 2) Mac processors, the difficulties of setting up VirtualBox as a hypervisor are apparent. Vagrant which can use VirtualBox as the underlying hypervisor did not work easily so I could not get started building eBPF apps right away.
Without any further ado, let’s dive into what you can do to experiment with eBPF on your Mac M1 or M2.
Requirements: Gear Up for the eBPF Journey
For the purposes of this awesome tutorial the following are required:
- Basic knowledge of Python and C
- Jetbrains as your IDE (or VS Code or Vim for the hardcore sysadmins)
- Lima for launching Linux VMs on your Mac
Let’s start by installing Lima. I use brew as my package manager which you can install by visiting the link if you have not yet used it.
- Install Lima using brew:
brew install lima
- Check that Lima is installed:
lima --version
limactl version 0.16.0
I won’t dive deep into installing Jetbrains because Google is always your friend.
Setting up the Lima VM
- Let’s start up by creating our working directory and setting up the VM with an Ubuntu 22.04 in place as well as the required packages for eBPF to run.
mkdir -p ~/Desktop/epbf-mac-arm-tutorial
cd ~/Desktop/epbf-mac-arm-tutorial
- Lima requires a YAML file with all the necessary configuration options. For the purposes of this tutorial we are going to use a slightly modified version of https://github.com/lima-vm/lima/blob/master/examples/ubuntu-lts.yaml. You can also check out all the available options in https://github.com/lima-vm/lima/blob/master/examples/default.yaml.
cat <<EOF > ubuntu-lts-ebpf.yaml
images:
# Try to use release-yyyyMMdd image if available. Note that release-yyyyMMdd will be removed after several months.
- location: "https://cloud-images.ubuntu.com/releases/22.04/release-20230518/ubuntu-22.04-server-cloudimg-amd64.img"
arch: "x86_64"
digest: "sha256:afb820a9260217fd4c5c5aacfbca74aa7cd2418e830dc64ca2e0642b94aab161"
- location: "https://cloud-images.ubuntu.com/releases/22.04/release-20230518/ubuntu-22.04-server-cloudimg-arm64.img"
arch: "aarch64"
digest: "sha256:b47f8be40b5f91c37874817c3324a72cea1982a5fdad031d9b648c9623c3b4e2"
# Fallback to the latest release image.
- location: "https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-amd64.img"
arch: "x86_64"
- location: "https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-arm64.img"
arch: "aarch64"
memory: "2GiB"
cpus: 2
disk: "30GiB"
ssh:
# You can choose any port or omit this. Specifying a value ensures same port bindings after restarts
# Forwarded to port 22 of the guest.
localPort: 2222
# We are going to install all the necessary packages for our development environment.
# These include Python 3 and the bpfcc tools package.
provision:
- mode: system
script: |
#!/bin/bash
set -eux -o pipefail
export DEBIAN_FRONTEND=noninteractive
apt update && apt-get install -y vim python3 bpfcc-tools linux-headers-$(uname -r)
- mode: user
script: |
#!/bin/bash
set -eux -o pipefail
sudo cp /home/$(whoami).linux/.ssh/authorized_keys /root/.ssh/authorized_keys
EOF
- Our configuration file is now set up and now let’s finally spin up our amazing Lima VM. Run the following command and press “Proceed with the current configuration” at the dialog that will show on your terminal.
limactl start --name=ebpf-lima-vm ./ubuntu-lts-ebpf.yaml
Congratulations, you now have a fully working Linux VM on your Mac 🎉 Let’s now set up the rest of our development environment on PyCharm.
Disclaimer: Ensure that you run this command
cat ~/.lima/ebpf-lima-vm/ssh.config >> ~/.ssh/config
. It appends Lima’s SSH configuration options to your default SSH options just to make it easier for PyCharm to connect to the VM.
Setting up PyCharm to SSH into our Lima VM
Since working remotely became a necessity, PyCharm offers the Remote Development functionality to help you code, run, debug, and deploy your projects remotely. You can learn more about it here. Here are the detailed steps to connect to localhost:2222
with PyCharm and use the remote development feature:
- Open PyCharm and create a new project by clicking on “Create New Project” on the welcome screen or by going to “File” > “New Project” on the top menu bar.
- Go to Preferences or press the corresponding shortcut ⌘ + , (comma).
- Click on “SSH Configuration” under “Tools”.
- Click on the “+” button to add a new SSH configuration.
- Enter “localhost” for the Host name and “2222” for the Port number.
- Enter your Lima VM’s username in the corresponding field. The username should be the same as your local hostname.
- In the “Authentication Type” field choose “OpenSSH config and authentication agent”.
- Click on “Test Connection” to ensure that PyCharm has successfully connected to your Lima VM. If the connection is successful, you should see a message saying “Connection successful” in a green banner.
Disclaimer: In order to be able to run code from the IDE directly use
root
at the username in the SSH configuration. eBPF can only run by superusers.
Now that we have set up the SSH configuration it’s time to set up a remote Python interpreter so we can run our code from the IDE but inside the Lima VM. Pretty cool right?!
- Once again, go to Preferences or press the corresponding shortcut ⌘ + , (comma).
- Click on “Project: \<Your project name\>” on the left sidebar and then click on “Python Interpreter”.
- Click on “Add Interpreter” in the top right of the window.
- Choose “On SSH…” from the list of available options and in the pop-up choose “Existing” and select the SSH configuration we have set up before.
- Click on “Next” on the current and next screen.
- Choose “System Interpreter” and change the sync paths from
/tmp/X
to/home/<hostname>/ebpf-mac-arm-tutorial
.
Hello Mac, from eBPF 👋
The Python script provided is a simple eBPF program that hooks into the execution of any process on the system and prints a custom message (“Hello Mac. I am an eBPF program!”) to the kernel trace pipe every time this happens. Let’s start by creating a .py
file in our workspace from PyCharm and paste the following code.
#!/usr/bin/python3
from bcc import BPF
# 1
program = r"""
int hello(void *ctx) {
bpf_trace_printk("Hello Mac. I am an eBPF program!");
return 0;
}
"""
# 2
b = BPF(text=program)
# 3
syscall = b.get_syscall_fnname("execve")
# 4
b.attach_kprobe(event=syscall, fn_name="hello")
# 5
b.trace_print()
We are going to look into each meaningful line one by one to explain and understand what’s going on.
- This block of code is the actual eBPF program defined as a multi-line raw string in Python. This eBPF program defines a function called
hello
that takes a context pointerctx
and returns an integer. Inside this function,bpf_trace_printk
is called to print "Hello Mac. I am an eBPF program!" to the kernel trace pipe every time the function runs. - This line creates an instance of the
BPF
class using the eBPF program defined above. This instance,b
, is now a handle to the eBPF system. - This line gets the function name of the
execve
system call on the running system and stores it insyscall
. Theexecve
system call is used to execute a program, which is typically how processes are started on Unix-like operating systems. - This line attaches the eBPF program (the
hello
function defined in our eBPF code) to a kprobe, a type of dynamic tracing technology that can instrument kernel function entries. The specific kernel function it instruments is theexecve
system call. This means that every time a new process is started (which involves callingexecve
), thehello
function will be invoked. - This line prints the output from the
bpf_trace_printk
calls in our eBPF program. It reads from the trace pipe where these messages were written. This will include our "Hello Mac. I am an eBPF program!" message every time a new process is started.
We are now ready to run out eBPF program! Click on run on the top right of the Jetbrains IDE as shown in the screenshot below and you should see a terminal pop up.
In order to display our message we can open a new terminal window, run limactl shell ebpf-lima-vm
and run a few commands such as ls
, cat
etc. You should see an output like the one below.
Conclusion
Armed with this newfound knowledge and skills, you’re now equipped to delve deeper into the exciting world of eBPF. So, get out there, start experimenting, and let’s reshape the boundaries of what’s possible together. Remember, every hurdle is but a stepping stone to innovation. And in the realm of technology, innovation is the name of the game. Happy coding!
P.S. Given that this is the first article I am writing on Medium let me know if it helped and whether you like my article writing skills 😛