Part 1: eBPF Map Metrics Prometheus Exporter
Observability of eBPF Maps and Prometheus
I’ve been working with eBPF for over two years, primarily focusing on experimenting with different applications and developing proof-of-concepts to explore its capabilities. Recently, however, I have shifted my attention to evaluating eBPF’s performance, with a particular emphasis on observability and monitoring. To my surprise, I discovered there is no straightforward method to determine the number of elements stored in an eBPF map. This raises a crucial question: how can I ensure my map won’t become full and drop elements, potentially affecting application performance? It’s clear that we need effective observability to tune eBPF map sizes and monitor them for ongoing improvements. In this blog, I describe the various challenges I encountered while developing a solution to this problem.
Brainstorming
Before diving into the details, let’s outline the goals for our exporter. Ideally, our exporter should:
- Provide up-to-date metric values
- Include all eBPF maps on the host
- Operate independently of eBPF map reloads or the exporter’s own restarts
This may seem straightforward, but there are complexities involved. Here are some ideas I explored:
Idea #1
Develop and hook an eBPF program into the kernel that triggers on every map creation and update. This would allow us to track changes, such as updates and deletions of key-value pairs, and export metrics accordingly.
Problem — This approach would only correctly track the maps loaded after the exporter is already running. To my knowledge, there is no in-kernel function or user-space API to retrieve the list of all updated keys beforehand.
Idea #2
Track only pinned eBPF maps, which is a common scenario in production. Walk through the eBPF file system, load all pinned maps, and count the elements in each.
Problem — This method does not support non-pinned maps but is still a reasonable approach for many users.
Idea #3
Integrate monitoring directly into your application that loads the eBPF maps.
Problem — This allows tracking both pinned and non-pinned maps but only those loaded by the application. Nonetheless, this covers a significant number of use cases.
For this proof-of-concept (POC), I decided to pursue the second and third ideas, as they cover most scenarios. While these methods generally meet our goals, they fall short in the number of maps our tools can monitor. Specifically, I could not develop a general eBPF maps metrics exporter that would export metrics for all maps on the host.
The two biggest challenges are:
- Referencing non-pinned maps loaded by other programs
- Retrieving the number of elements updated in the map directly from its kernel structure
Though this may be slightly disappointing, sharing both successful and unsuccessful proof-of-concepts is valuable. Both provide great learning experiences. Here are the links to the repositories with the code for the described ideas:
- Monitor only pinned eBPF Maps
- Integrate it into your application
UPDATE! — Found the solution. See in Part 2:
Conclusion
In conclusion, developing a Prometheus Exporter for monitoring eBPF maps revealed the complexities and challenges inherent in this task. Despite exploring multiple strategies, including tracking pinned maps and integrating monitoring within applications, a comprehensive solution for all eBPF maps remains elusive. The inability to reference non-pinned maps and directly retrieve element counts from kernel structs presents significant hurdles. However, the progress made through these proof-of-concepts provides valuable insights and tools that can serve many practical scenarios. Sharing these experiences, both successes and setbacks, contributes to the broader knowledge base and aids in the ongoing development of robust observability solutions for eBPF technology.
To stay up-to-date with the latest cloud technologies, make sure to subscribe to my newsletter, Cloud Chirp. 🚀