BPF and async Rust

FUJITA Tomonori
nttlabs
Published in
2 min readNov 8, 2021

With libbpf-rs, bindings to libbpf (canonical C BPF library), building BPF applications in Rust is easy. However, libbpf-rs doesn't work with async/await. Say, to send information from kernel with BPF over protocols like gRPC, you have to implement lots instead of using the existing async libraries.

To address the above problem, I implemented libbpf-async, complementary to libbpf-rs, providing APIs for BPF ring buffer currently. The combination makes async Rust work with BPF like the following. I'll explain how to handle memory barriers in Rust for accessing to BPF ring buffer, which I found little information about during the development.

#[tokio::main]
async fn main() {
let mut builder = YourSkelBuilder::default();
let mut skel = builder.open().unwrap().load().unwrap();

let mut rb = libbpf_async::RingBuffer::new(skel.obj.map_mut("ringbuf").unwrap());
loop {
let mut buf = [0; 128];
let n = rb.read(&mut buf).await.unwrap();
// do something useful like sending data to a channel
}
}

BPF ring buffer

BPF ring buffer enables an application to get information from the kernel efficiently, without system call or memory copies. The kernel and application share memory, increment the producer and consumer indexes respectively.

How application and kernel access to BPF ring buffer

The API of libbpf to wait for new record internally invokes epoll_wait() and could block. So an async Rust API for waiting can't be built by simply wrapping it. Instead, libbpf-async directly accesses to the BPF ring buffer data structure. When waiting for new record, an async runtime (Tokio) takes care.

Memory ordering in Rust

The application and kernel access to the ring buffer without any lock so proper memory ordering is mandatory (the compiler and the architectures are free to reorder memory accesses). A way to ensure memory ordering is architecture-specific. I use arm64 in this article.

The following shows how C (libbpf) reads the producer index.

unsigned long index = *(volatile unsigned long *)index_addr;
asm volatile("dmb ish" ::: "memory");

The first line makes sure the compiler doesn’t optimize loading from index_addr. The second inline assembly guarantees all the LOAD and STORE operations specified before the instruction will appear to happen before all the LOAD and STORE operations specified after the instruction with respect to other CPUs.

Rust supports std::ptr::volatile_read, which load memory without compiler optimization. The inline assembly in Rust is still under development (available for only nightly builds). However, std::sys::atomic::fence issues memory ordering instructions. The following Rust code is equivalent to the above libbpf code.

let index = std::ptr::read_volatile(index_addr as *const c_ulong);
std::sync::atomic::fence(SeqCst);

Conclusion

libbfp-async, for BPF applications in async Rust, is available here (example code too).

Common cases for memory ordering can be handled in Rust. The upcoming inline assembly support would be helpful for further improvement (the kernel uses more efficient assembly instructions than libbpf).

Corrections, comments, and suggestions would be greatly appreciated.

--

--

FUJITA Tomonori
nttlabs

Janitor at the 34th floor of NTT Tamachi office, had worked on Linux kernel, founded GoBGP, TGT, Ryu, RustyBGP, etc. https://twitter.com/brewaddict