Monitoring Inter-Pod Traffic at the AZ Level with Retina (an eBPF based tool)
Introduction
Recently, I was tasked with analyzing cross-AZ traffic in Kubernetes to identify opportunities for reduction, as it is a significant contributor to our AWS bill. The first step was to understand how traffic flows between services and what portion consistently crosses Availability Zones (AZs). To optimize cross-AZ traffic, I considered using topology-aware routing for services. However, before implementing this solution, I needed a method to effectively analyze inter-pod traffic at the AZ level.
To achieve this, monitoring network traffic at the pod level is necessary. I decided to use eBPF (Extended Berkeley Packet Filter) technology, as it allows us to observe network interactions with minimal performance overhead.
In this article, I will explain what eBPF is, explore the tools available for using it, and provide a step-by-step guide on implementing monitoring for inter-pod traffic using Retina, Kube State Metrics, Prometheus, and Grafana.
What is eBPF?
Extended Berkeley Packet Filter (eBPF) is a powerful and flexible technology embedded in the Linux kernel that allows developers to run custom, user-defined programs safely and efficiently within the kernel space. Originally designed for packet filtering, eBPF has evolved to offer a wide range of capabilities for monitoring, tracing, profiling, and securing applications.
In the context of Kubernetes, eBPF is particularly useful for monitoring inter-pod traffic because it provides detailed insights into network behavior without relying on traditional, potentially more intrusive methods like packet sniffing or application-level logging. eBPF can observe every network packet entering and leaving a pod, giving you a complete view of communication patterns across your cluster.
What is Microsoft Retina?
Retina is an advanced network observability tool designed specifically for Kubernetes environments. Built to leverage the capabilities of eBPF, Retina provides deep insights into network traffic and interactions at the pod level, with a particular focus on cross-AZ communication within Kubernetes clusters.
I chose Retina over other tools for network observability due to its ease of configuration. Retina can be configured easily, allowing monitoring at the namespace or pod level. This is especially useful when you want to avoid generating metrics for an entire cluster.
Implementation
At a high level, the following steps are required to achieve our goal:
- Deploy Two Services to Monitor:
Server: A simple Flask API with a single endpoint that returns the pod’s Availability Zone (AZ) as a response.
Client: This service continuously sends requests to the server’s endpoint and logs its own AZ along with the server’s response. This was just for logging and generating some meaningful traffic for the initial testing of cross AZ traffic. - Install and Configure Retina via Helm Chart.
- Configure Prometheus to Scrape Metrics from Retina.
- Create a Grafana Dashboard for Analyzing Metrics.
Deploying Services
Deploy two simple services into different namespaces. The source code and Kubernetes manifests can be found here. As a result, we will have:
- Six pods of the server app deployed to the app-server namespace and spread across different AZs. These pods are exposed using a Kubernetes service of the ClusterIP type.
- Ten pods of the client app deployed to the app-client namespace and distributed across different AZs.
Both namespaces are annotated with retina.sh=observe to enable Retina to monitor network traffic.
Install and Configure Retina
We will use the official Helm chart to install Retina. For this setup, we will install only two plugins: packetparser and packetforward.
VERSION=$( curl -sL https://api.github.com/repos/microsoft/retina/releases/latest | jq -r .name)
helm upgrade --install retina oci://ghcr.io/microsoft/retina/charts/retina \
--version $VERSION \
--namespace kube-system \
--set image.tag=$VERSION \
--set operator.tag=$VERSION \
--set image.pullPolicy=Always \
--set logLevel=info \
--set os.windows=true \
--set operator.enabled=true \
--set operator.enableRetinaEndpoint=true \
--skip-crds \
--set enabledPlugin_linux="\[dropreason\,packetforward\,linuxutil\,dns\,packetparser\]" \
--set enablePodLevel=true \
--set enableAnnotations=true \
--set remoteContext=true
Next, we need to install the MetricsConfiguration CRD.
apiVersion: retina.sh/v1alpha1
kind: MetricsConfiguration
metadata:
name: metricsconfigcrd
spec:
contextOptions:
- metricName: drop_count
sourceLabels:
- ip
- podname
- port
additionalLabels:
- direction
- metricName: forward_count
sourceLabels:
- ip
- podname
- port
destinationLabels:
- ip
- podname
- port
additionalLabels:
- direction
namespaces:
include:
- app-server
- app-client
Configuring Prometheus
As the final step in this setup, we need to configure Prometheus to scrape these metrics from Retina. According to the Retina documentation, the following scrape configuration must be added to Prometheus:
- job_name: "retina-pods"
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_name]
action: keep
regex: retina(.*)
- source_labels:
[__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
separator: ":"
regex: ([^:]+)(?::\d+)?
target_label: __address__
replacement: ${1}:${2}
action: replace
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: instance
metric_relabel_configs:
- source_labels: [__name__]
action: keep
regex: (.*)
Results and Analysis
Once configuration is complete, metrics will be available in Prometheus as follows:
Retina emits metric named networkobservability_adv_forward_bytes (along with other metrics, of course) with labels podname, namespace, and workload_kind for both source and destination. Additionally, there is a label called direction that indicates the direction of traffic from the destination pod’s perspective.
To determine the availability zones of these pods, you first need to identify the node and find the node’s availability zone. You can use Kubernetes state metrics, specifically kube_pod_info and kube_node_labels, as all nodes are labeled with the topology.kubernetes.io/zone label.
With all this information, we can construct the following PromQL query:
sum(
ceil(
(
sum(
rate(networkobservability_adv_forward_bytes{
source_namespace="app-client",
source_workload_name=~"client-(.*)",
destination_namespace="app-server",
destination_workload_name=~"server-(.*)"
}[$__rate_interval])
* on (source_podname) group_left(source_node)
label_replace(
label_replace(kube_pod_info{
namespace="app-client",
created_by_name=~"client-(.*)"
}, "source_podname", "$1", "pod", "(.*)"),
"source_node", "$1", "node", "(.*)"
)
* on (source_node) group_left(zone)
label_replace(
label_replace(
sum by (node, label_topology_kubernetes_io_zone) (kube_node_labels),
"source_node", "$1", "node", "(.*)"
),
"zone", "$1", "label_topology_kubernetes_io_zone", "(.*)"
)
) by (destination_podname, source_podname, zone)
+ on (destination_podname, source_podname, zone) group_left()
(
sum(
rate(networkobservability_adv_forward_bytes{
source_namespace="app-client",
source_workload_name=~"client-(.*)",
destination_namespace="app-server",
destination_workload_name=~"server-(.*)"
}[$__rate_interval])
* on (destination_podname) group_left(destination_node)
label_replace(
label_replace(kube_pod_info{
namespace="app-server",
created_by_name=~"server-(.*)"
}, "destination_podname", "$1", "pod", "(.*)"),
"destination_node", "$1", "node", "(.*)"
)
* on (destination_node) group_left(zone)
label_replace(
label_replace(
sum by (node, label_topology_kubernetes_io_zone) (kube_node_labels),
"destination_node", "$1", "node", "(.*)"
),
"zone", "$1", "label_topology_kubernetes_io_zone", "(.*)"
)
) by (destination_podname, source_podname, zone)
)
) / 2
)
Here we are fetching amount of traffic flown between these workloads, by aggregating the traffic rates, mapping pods to their nodes and zones, and then averaging the values for source and destination. Using on() function we find intersection of source and destination zone which implies same AZ traffic.
Obviously, the remaining amount of bytes that do not fall into this query is considered as cross AZ traffic.
I have created the following Grafana dashboards to analyze this traffic from different perspectives:
All related code can be found in my GitHub repository.
Summary
In this article, we explored methods for analyzing inter-pod traffic in Kubernetes, focusing on cross-AZ communication and its impact on AWS costs. We leveraged eBPF technology for efficient network monitoring, utilizing tools such as Retina, Prometheus, and Grafana. The guide covered the setup and configuration of these tools to capture and analyze traffic metrics. By deploying a sample environment and analyzing the collected data, we demonstrated how to gain insights into traffic patterns, optimize cross-AZ communication, and effectively manage network performance in Kubernetes clusters.