The Power of eBPF

Published in

Globant

15 min readAug 30, 2023

In layman’s terms, what JavaScript is to a browser, eBPF is to Linux kernels. eBPF, which stands for extended Berkeley Packet Filter, is a powerful and flexible technology that allows for programmable packet processing and analysis within the Linux kernel. It originated as an enhancement to the traditional BPF, which was a packet-filtering mechanism used in network devices and later introduced into the Linux kernel for filtering packets in networking and security applications.

In this article, we are going to dive deep into eBPF technology and try to understand its key concepts and use cases. Here are some of the features of eBPF:

eBPF extends the capabilities of traditional BPF by allowing user-defined programs to be loaded and executed within the kernel, enabling custom packet processing and analysis at various points in the networking stack, including packet filtering, tracing, monitoring, and more.
eBPF programs are written in a restricted virtual machine (VM) language that is safe, sandboxed, and verified at runtime for security purposes, making it a powerful tool for implementing complex networking and security logic safely and efficiently.

Why do we need eBPF?

To understand the impact of eBPF on the Linux kernel, we should have a high-level understanding of the architecture of the Linux kernel and how it interacts with applications and hardware:

Architecture of the Standard Linux Kernel (Image Credit: researchgate.net)

The Linux kernel is the fundamental component of the Linux operating system and functions as an intermediary layer between hardware and applications.

Applications are executed in a non-privileged layer known as user space that lacks direct access to hardware resources.
Rather than directly accessing the kernel, an application uses the system call (syscall) interface to ask the kernel to perform actions on its behalf.
The kernel handles hardware access, which can involve tasks such as reading and writing files, sending or receiving network traffic, or accessing memory.

Observing an application’s interactions with the kernel can provide valuable insights into its behavior since the kernel serves as an interface between applications and hardware and manages concurrent processes, enabling multiple applications to run simultaneously. A simple echo command involves close to 40 system calls. The strace package is a useful tool for displaying all the system calls made by a command. By using strace, we can observe the system calls made by various commands, including the echo command:

**System Calls made by echo command (Observerd using strace)**

While the Linux kernel provides several benefits, such as implementing networking, observability, and security features, it can also pose certain challenges. The developers have historically faced issues with complex infrastructure that are hard to debug. eBPF helps to address these issues.

eBPF is a kernel technology (fully available since Linux 4.4). It allows programs to run on the kernel without needing to modify the source code or add additional modules. It can be conceived as a lightweight virtual machine where programmers can run BPF bytecode that makes use of kernel resources.

Through eBPF, the software can use existing layers instead of modifying kernel source code. As a result, the process of delivery of services such as security, networking, and observability is fundamentally changed.

Benefits of eBPF Programs

eBPF allows developers access to a privileged layer in the Linux Kernel system, which opens up a host of possibilities. These are some of the benefits offered by eBPF:

Dynamic Loading: eBPF programs can be loaded into and removed from the kernel dynamically. Once they are attached to an event, they’ll be triggered by that event, regardless of what caused that event to occur. This is a huge advantage compared to upgrading the kernel and then having to reboot the machine to use its new functionality.
Performance: eBPF programs are a very efficient way to add instrumentation. Once loaded and JIT-compiled, the program runs as native machine instructions on the CPU. They are executed directly in the Linux kernel, which allows them to bypass the overhead of user-space to kernel-space context switching, resulting in lower latency and higher throughput. eBPF programs are also typically small in size, which helps reduce memory footprint and cache misses, leading to improved performance.
Safety: eBPF programs are designed with safety in mind. They are sandboxed and run in a restricted environment with limited access to system resources, preventing them from causing crashes or compromising system stability. This safety feature allows for the use of eBPF programs in critical production environments without the need for privileged access, further enhancing their performance by avoiding the overhead of privilege escalation.

eBPF Architecture and Basic Concepts

Let’s now review the different components that comprise the architecture of eBPF.

Predefined Hooks

eBPF programs operate based on triggered events, with applications (or the kernel) passing through a threshold referred to as a “hook point.” The hooks cover a range of events, including function entry and exit, kernel tracepoints, system calls, and network events. If a specific need lacks a predefined hook, a user or kernel probe (uprobe or kprobe) can be created.

Uprobes: Uprobes, short for User probes, allow eBPF programs to attach to user-space processes or applications and trace their execution at specific user-level instructions. Uprobes enable fine-grained tracing of user-space programs, allowing users to gather information about function calls, arguments, return values, and other events within the user-space code. Uprobes are typically used for profiling, debugging, and performance monitoring of user-space applications.
Kprobes: Kprobes, short for Kernel probes, allow eBPF programs to attach to kernel functions and trace their execution. Kprobes enable dynamic tracing of kernel-level code, allowing users to intercept function calls, modify function arguments, and gather other information about kernel-level events. Kprobes are commonly used for diagnosing and profiling kernel-level behavior, identifying performance bottlenecks, and monitoring system behavior.

eBPF Virtual Machine

The eBPF virtual machine, similar to other virtual machines, is a software-based implementation of a computer. The eBPF bytecode instructions, which comprise the program, must be translated into native machine instructions that can run on the CPU.

With eBPF, when the program is loaded into the kernel, the JIT(just-in-time) compiler converts the bytecode to machine instructions only once.

This is a significant change from previous implementations of eBPF, where the kernel interpreted bytecode instructions, resulting in the kernel analyzing the instructions and converting them into machine code each time an eBPF program was executed. Naturally, the JIT compiler gives a high performance in comparison to the interpreter.

The eBPF bytecode is composed of instructions that manipulate virtual eBPF registers. The eBPF instruction set and register model were intentionally designed to align with standard CPU architectures, making the process of compiling or interpreting from bytecode to machine code relatively simple.

eBPF Registers

The eBPF virtual machines are equipped with ten general-purpose registers, numbered 0 through 9, with Register 10 functioning as a read-only stack frame pointer. During the execution of a BPF program, these registers are utilized to store values and maintain the program’s state.

It’s crucial to recognize that the eBPF registers within the eBPF virtual machines are software-based. We can see them enumerated from BPF_REG_O to BPF_REG_10 in the header file of the Linux kernel’s source code.

Before an eBPF program begins its execution, the context argument is loaded into Register 1, while the return value from the function is stored in Register 0. If calling a function from the eBPF code, the arguments for that function are placed in Registers 1 through 5 (if there are fewer than five arguments, not all registers will be used).

Program Verification

When an eBPF program is loaded into the kernel, the verification process ensures that the program is safe. To verify the safety of the eBPF program, the verifier scrutinizes all possible execution paths and confirms the safety of each instruction. The verifier also makes necessary modifications to the bytecode.

It is worth noting that the verifier examines the eBPF bytecode and not the source code directly. The bytecode produced by the compiler is used as input for the verifier:

The Verification Process: The role of the verifier is to examine the program and evaluate all potential execution paths. It goes through the instructions in order and assesses them rather than executing them. While doing so, it monitors the state of each register and maintains this information in a data structure named bpf_reg_state. This structure includes a field called bpf_reg_type which describes what type of value is held in that register. Various types of values are present in this field. Here are some examples:
NOT_INIT means that the register has not been assigned a value yet.
SCALAR_VALUE means that the register has been set to a value that does not represent a pointer.
Several PTR_TO_* types mean that the register holds a pointer to something. That something could be, for example:
PTR_TO_CTX: The register contains a pointer to the context that was passed as an argument to the BPF program.
PTR_TO_PACKET: The register points to the network packet.

At each branch where a decision must be made on whether to continue with the current sequence or switch to a different instruction, the verifier saves a copy of the current register state onto a stack and proceeds to explore one of the available paths. The verifier proceeds to evaluate instructions until it encounters the “return” statement at the end of the program or reaches the maximum transaction limit of one million instructions. It also pops a branch of the stack to evaluate next. However, if it encounters an instruction that may lead to an invalid operation, the verifier fails the verification process:

The Verifier Log: If the verification of a program fails, the verifier produces a log that shows how it determines the program is invalid. The log generated by the verifier includes a report of the amount of work the verifier performed. An example of the work summary found in the log file is as follows:

processed 61 insns (limit 1000000) max_states_per_insn 0 total states 4 peak_states 4 mark_read 3

Let’s break it down:

processed 61 insns: This indicates that the eBPF program has processed 61 instructions (insns) during its execution. Instructions are the basic building blocks of eBPF programs, and they are typically used to perform various operations such as filtering, counting, and aggregating data.
(limit 1000000): This indicates that there is a limit of 1,000,000 instructions (insns) that can be executed by the eBPF program. This is a safety measure to prevent infinite loops or overly complex programs from running indefinitely and consuming excessive resources.
max_states_per_insn 0: This denotes that the maximum number of states, such as variables or storage slots, that an instruction can access is restricted to zero.
total states 4 peak_states 4 mark_read 3: This indicates that the eBPF program has used a total of 4 states (i.e., variables or storage slots) during its execution, with a peak of 4 states used at any given time. The mark_read value of 4 indicates that the program has performed read operations on 4 states.

eBPF Maps

eBPF maps are key-value data structures that provide a way for eBPF programs to store, retrieve, and share data between different components of a Linux system, such as user-space programs, kernel modules, and other eBPF programs. eBPF maps serve as a communication channel for passing data and state information between different parts of the system in a dynamic and efficient manner.

eBPF maps are defined in eBPF programs using a special map declaration syntax, and they can be created and managed by both user-space programs and kernel modules. eBPF maps can have various types, including arrays, hashes, and per-CPU maps, each with different characteristics and use cases. eBPF maps can be used to store and manipulate different types of data, such as counters, histograms, timestamps, and more.
eBPF maps provide a way for eBPF programs to share data with user-space programs and other eBPF programs, enabling powerful use cases such as monitoring and profiling of system performance, network packet processing, security monitoring, and more. eBPF maps are a fundamental feature of eBPF and are widely used in eBPF programs to store and exchange data in a flexible, efficient, and dynamic manner.

Helper Calls

Helper calls in eBPF programs refer to special functions provided by the Linux kernel that eBPF programs can invoke to perform certain privileged operations or access information that is not directly accessible from eBPF programs alone.

The calls are implemented as kernel functions that can be invoked by eBPF programs using a special instruction called BPF_FUNC. The eBPF program specifies the helper function to call, along with any required arguments, and the helper function is executed in the kernel on behalf of the eBPF program.
Helper calls in eBPF programs provide a way to perform tasks that would otherwise be impossible or impractical to achieve using only eBPF instructions. For example, some common use cases of helper calls include accessing system information (e.g., getting the current time, querying network statistics), modifying system state (e.g., adding or deleting firewall rules), and interacting with user-space programs (e.g., sending events or notifications).

Function and Tail Calls

Both function and tail calls in eBPF programs provide flexibility and control over the execution flow of eBPF programs, allowing for more complex and modular program design. However, they should be used judiciously, as improper use can result in performance issues or other unexpected behavior. Care should be taken to ensure proper validation of data and control flow, and consideration of security best practices should be considered when using these advanced features in eBPF programs.

Function Calls: Function calls in eBPF programs refer to the ability to call one eBPF program from another eBPF program, similar to a subroutine or function call in traditional programming languages. This enables eBPF programs to be modular and organized into smaller, reusable components, making it easier to develop and maintain complex eBPF programs.

To use function calls in eBPF programs, a BPF_CALL instruction is used, specifying the index of the target eBPF program to call. The called program is executed with its own set of registers and stack, and it can return a value to the calling program. Function calls in eBPF programs can be used for various purposes, such as encapsulating common functionality in separate eBPF programs, implementing conditional logic, or modularizing eBPF program logic.

Tail Calls: Tail calls in eBPF programs allow for the efficient chaining of multiple eBPF programs together, reducing the need for nested function calls and improving performance. Tail calls are used when an eBPF program needs to transfer control to another eBPF program without returning to the original calling program.

To use tail calls in eBPF programs, a BPF_TAIL_CALL instruction is used, specifying the index of the target eBPF program to call. The target program takes over the execution flow, and the original calling program is not resumed. Tail calls can be useful for cases where a sequence of eBPF programs needs to be executed in a specific order or when certain conditions are met.

The Process of Writing eBPF Programs

There are several different libraries and frameworks to write eBPF programs. Often, we might use eBPF indirectly through a project like bpftrace or Cilium. These projects offer abstractions on top of eBPF, so we don’t have to write the program directly. We use a declarative approach to pass instructions that are implemented by eBPF.

If there isn’t a higher level of abstraction, we need to write the programs directly. The Linux kernel requires that we load eBPF programs in bytecode form. Although writing in bytecode is technically feasible, it is not a popular option. Developers generally prefer to use a compiler suite, such as LLVM or the BCC framework, to compile pseudo-C code into eBPF bytecode.

Due to the relative simplicity of the BCC Python framework, we will use it to write an eBPF program and attach it to a system call. We will go through each step in the program to understand the execution flow:

#!/usr/bin/python3
from bcc import BPF                                 #1

program = r"""                                      #2
int hello(void *ctx) {
    bpf_trace_printk("Hello, World!\n");
    return 0;
}"""

b = BPF(text=program)                               #3
syscall = b.get_syscall_fnname("execve")            #4
b.attach_kprobe(event=syscall, fn_name="hello")     #5

b.trace_print()                                     #6

Importing Required Libraries: The program starts by importing the BPF class from the bcc library, which is a Python library for writing eBPF programs.
Defining the eBPF Program: The eBPF program is defined as a multi-line string assigned to the “program” variable. The program defines a single eBPF function named “hello” that takes a void pointer ctx as an argument. Inside the function, the bpf_trace_printk function is called with the string Hello, World!\n as an argument, which will print “Hello, World!” followed by a new line to the kernel trace buffer.
Creating BPF Object: The “BPF” class is initialized with the text argument set to the program variable, which compiles the eBPF program and creates a BPF object b representing the compiled eBPF program.
Finding Syscall Function: The get_syscall_fnname method is called on the BPF object b with the argument execve, which retrieves the name of the execve system call in the current kernel version. This is stored in the syscall variable.
Attaching Kprobe: The attach_kprobe method is called on the BPF object b with the arguments event set to syscall (the name of the execve system call) and fn_name set to hello (the name of the eBPF function to be attached as the kprobe). This attaches the hello eBPF function as a kprobe to the execve system call, which means the hello function will be executed before the execve system call is executed.
Starting Trace: The trace_print method is called on the BPF object b which starts tracing the events associated with the attached kprobe. When the execve system call is executed, the hello eBPF function will be triggered, and the “Hello, World!” message will be printed to the kernel trace buffer.

Let’s execute this program now. I used a shell script loop.sh which runs an infinite loop and runs the ‘ls’ command to trigger system events. Here is how it went:

b' loop.sh-3970 [000] …. 5681.288059: 0: Hello World!'

The output is a result of tracing the execution of a program named loop.sh with process ID 3970. This trace output is generated by an eBPF program, which is attached as a kprobe to the execve system call and is triggered when the loop.sh program is executed.

Let’s break down the output:

loop.sh-3970: This indicates the name of the process that triggered the execve system call. In this case, the process name is “loop.sh” and its process ID is 3970.
[000]: This indicates the CPU core on which the trace event occurred. In this case, it is CPU core 0.
…. 5681.288059: This indicates the timestamp of the trace event. In this case, the event occurred at 5681.288059 seconds.
0: This indicates the return value of the hello eBPF function, which is always set to 0 in this case, as per the return 0 statement in the eBPF function.
Hello World!: This is the message printed by the bpf_trace_printk function in the helloeBPF function, which is Hello, World! followed by a newline character (“\n”).

The trace output confirms that the hello eBPF function was successfully triggered when the execve system call was executed by the loop.sh process, and the Hello, World!message was printed to the kernel trace buffer.

eBPF’s Use Cases

This is a list of common use cases for eBPF:

Network Visibility and Analysis: eBPF is commonly used for network visibility and analysis tasks, such as packet filtering, monitoring, tracing, and profiling.
Performance Monitoring and Profiling: eBPF can be used to collect performance metrics and profile system performance, including CPU usage, memory usage, disk I/O, and other system-level statistics.
Security and Intrusion Detection: eBPF can be used for security-related tasks, such as implementing firewalls, intrusion detection systems (IDS), and other security mechanisms.
Tracing and Debugging: eBPF provides powerful tracing capabilities that allow for dynamic tracing and debugging of system events, kernel functions, and user-space programs.
Custom kernel Extensions: eBPF can be used to implement custom kernel-level extensions or modifications without modifying the kernel source code directly.

Scenarios where eBPF is not preferred.

This is a list of some scenarios where eBPF should be avoided:

Implementing application-layer policy: Implementing application-layer policy and deep protocol inspection using eBPF would come with a significant tradeoff between performance and cost efficiency. One way to work around this is by utilizing the Linux kernel’s connection tracker to implement a policy and apply it per flow, regardless of the number of packets, and then mark it in the Linux conntrack tables as allowed or denied. We don’t need to keep checking every packet in the flow. If we were to implement a policy with eBPF, which allows us to have several HTTP transactions on a single TCP connection, we would need to inspect every packet to detect these transactions and then implement layer 7 controls. To accomplish that, we would need to execute CPU cycles, which would become expensive.
User-Space Processing: For tasks that can be efficiently performed in user-space without requiring deep visibility into the kernel or network stack, using user-space libraries or tools may be more practical and easier to implement than eBPF.
Kernel Development: If someone is a kernel developer and has to implement complex kernel-level functionality or modifications, eBPF may not be the most appropriate choice. In such cases, directly modifying the kernel source code or using other kernel-level mechanisms may be more appropriate.

Conclusion

eBPF is a powerful technology that can be used for a wide range of tasks related to network visibility, performance monitoring, security, debugging, and custom kernel extensions. However, it is not always the best fit for every use case, and consideration should be given to the complexity of the task, the level of visibility and control required, and the availability of other tools or mechanisms that may be more suitable for the task at hand.