eBPF Maps State Synchronization across Multi-Node Kubernetes Cluster
eBPF Maps State Synchronization using Go and gRPC
When eBPF started gaining popularity, its initial adoption focused primarily on observability, offering developers new ways to monitor and understand their systems. As technology evolved, eBPF’s capabilities expanded significantly. Today, (among other applications) it is widely used for stateful networking solutions such as load balancing, connection tracking, firewalls, and Carrier-Grade NAT (CGNAT).
Deploying these stateful eBPF applications in clusters is essential to avoid single points of failure and ensure high availability. Unlike stateless applications, which do not require synchronization, stateful applications need to maintain consistent state information across all nodes in a cluster like Kubernetes. In stateful applications, state is maintained in the application or in some centralized database but in case of an eBPF application, state or rather information is maintained in the eBPF Maps. And, state of each node needs to be synchronized across the cluster.
But, there are no known synchronization tool or daemon available for eBPF Maps.
State Synchronization
State synchronization, also known as data coherence, is not a new concept — but it’s a complex one. It not only involves the problem of timing (time synchronization and event ordering), but also the prevention and resolution of race conditions, among other issues. While there are many ways to tackle this problems, we will focus on a proof-of-concept solution and how we detect the eBPF map changes and exchange it with other nodes.
Our approach leverages asynchronous eBPF map notification updates. Whenever there is a change in a specific eBPF map, this information is sent to all peers using the gRPC protocol, allowing them to synchronize accordingly. But let’s talk about this a bit more into details.
Detecting eBPF Map Updates
To keep things simple, we focus on detecting and triggering events specifically for updates to eBPF Hash Maps (BPF_MAP_TYPE_HASH
), while the concept for other types like BPF_MAP_TYPE_ARRAY
remain similar. We accomplish this by attaching our programs to eBPF fentry (“function entry”) kernel hook points. Specifically:
fentry/htab_map_update_elem
: Triggered on every element update in the Hash eBPF map.
SEC("fentry/htab_map_update_elem")
int BPF_PROG(bpf_prog_kern_hmapupdate, struct bpf_map *map, void *key, void *value, u64 map_flags) {
log_map_update(map, key, value, MAP_UPDATE);
return 0;
}
fentry/htab_map_delete_elem
: Triggered on every element deletion in the Hash eBPF map.
SEC("fentry/htab_map_delete_elem")
int BPF_PROG(bpf_prog_kern_hmapdelete, struct bpf_map *map, void *key) {
log_map_update(map, key, 0, MAP_DELETE);
return 0;
}
⚠️ Note: These hooks are triggered both from user space, via syscalls that update the eBPF Hash Map, and from kernel space, through the invocation of eBPF helper functions in your kernel programs.
To funnel these events to our user-space application, we use an eBPF ring buffer. We transfer event data from kernel space to user space using the following function. This function reads the parameters provided by the function entry hook, combines them into a struct, and then sends the struct to user space:
static void __always_inline
log_map_update(struct bpf_map* updated_map, unsigned int *pKey, unsigned int *pValue, enum map_updater update_type)
{
// This prevents the proxy from proxying itself
__u32 key = 0;
struct Config *conf = bpf_map_lookup_elem(&map_config, &key);
if (!conf) return;
if ((bpf_get_current_pid_tgid() >> 32) == conf->host_pid) return;
// Get basic info about the map
uint32_t map_id = MEM_READ(updated_map->id);
uint32_t key_size = MEM_READ(updated_map->key_size);
uint32_t value_size = MEM_READ(updated_map->value_size);
// memset the whole struct to ensure verifier is happy
struct MapData out_data;
__builtin_memset(&out_data, 0, sizeof(out_data));
bpf_probe_read_str(out_data.name, BPF_NAME_LEN, updated_map->name);
bpf_probe_read(&out_data.key, sizeof(*pKey), pKey);
out_data.key_size = key_size;
if (pValue != 0) {
bpf_probe_read(&out_data.value, sizeof(*pValue), pValue);
out_data.value_size = value_size;
}
out_data.map_id = map_id;
out_data.pid = (unsigned int)(bpf_get_current_pid_tgid() >> 32);
out_data.update_type = update_type;
// Write data to be processed in userspace
bpf_ringbuf_output(&map_events, &out_data, sizeof(out_data), 0);
}
State Change Broadcast
Once we receive the eBPF map update struct in the user space, we can optionally filter for specific eBPF maps or events we want to synchronize across the cluster. For simplicity, we’ll leave this discussion for some time else.
So we broadcast these events to cluster-wide entities using a reliable messaging framework. For our proof of concept, we’ve chosen gRPC (“Remote Procedure Call”) due to its performance benefits, although other frameworks could also be used effectively. In our setup, each node in the cluster functions both as a server and a client. Specifically, it acts as a server to receive updates from other nodes through the SetValue
gRPC call:
func (n *Node) SetValue(ctx context.Context, in *ValueRequest) (*Empty, error) {
value := in.GetValue()
key := in.GetKey()
_type := in.GetType()
// This operations are ATOMIC by nature! (Ref: https://man7.org/linux/man-pages/man2/bpf.2.html)
if MapUpdater(_type).String() == "UPDATE" {
n.syncObjs.HashMap.Update(key, value, ebpf.UpdateAny)
log.Printf("Client updated key %d to value %d", key, value)
} else if MapUpdater(_type).String() == "DELETE" {
n.syncObjs.HashMap.Delete(key)
log.Printf("Client deleted key %d", key)
}
return &Empty{}, nil
}
⚠️ Note: Sync messages are actually passed as parameters to the
SetValue
RPC call, triggered by client on the remote node.
And as a client to send sync messages (received through the eBPF ring buffer) to the rest of the cluster:
for {
record, err := rd.Read()
if err != nil {
panic(err)
}
// Print data for the sake of observing the messages
Event := (*MapData)(unsafe.Pointer(&record.RawSample[0]))
log.Printf("Map ID: %d", Event.MapID)
log.Printf("Name: %s", string(Event.Name[:]))
log.Printf("PID: %d", Event.PID)
log.Printf("Update Type: %s", Event.UpdateType.String())
log.Printf("Key: %d", Event.Key)
log.Printf("Key Size: %d", Event.KeySize)
log.Printf("Value: %d", Event.Value)
log.Printf("Value Size: %d", Event.ValueSize)
conn, err := grpc.NewClient(address, grpc.WithTransportCredentials(insecure.NewCredentials()))
if err != nil {
log.Fatalf("Failed to connect to peer: %v", err)
continue
}
client := NewSyncServiceClient(conn)
ctx, _ := context.WithTimeout(context.Background(), time.Second)
_, err = client.SetValue(ctx, &ValueRequest{Key: int32(Event.Key), Value: int32(Event.Value), Type: int32(Event.UpdateType), Mapid: int32(Event.MapID)})
if err != nil {
log.Printf("Could not set value on peer: %v", err)
} else {
log.Printf("Successfully send sync message")
}
log.Println("=====================================")
}
This dual role ensures that all nodes stay synchronized with the latest changes to the eBPF maps.
⚠️ Note: This is not a production code. For this demo, program was simplified to only work with two nodes, but this could be further improved.
Complete source code can be found on my GitHub repository:
Conclusion
In conclusion, synchronizing eBPF map states across a multi-node Kubernetes cluster is a complex but essential task for maintaining high availability and avoiding single points of failure in stateful applications. By leveraging eBPF fentry hook points and gRPC, we can effectively broadcast map updates to all nodes in the cluster, ensuring consistent state across the system. This proof of concept demonstrates a viable approach to state synchronization, though further refinements and optimizations are possible.
To stay up-to-date with the latest cloud tech updates, subscribe to my monthly newsletter, Cloud Chirp. 🚀