Troubleshooting Network Latency in GKE Clusters with ASM: A Packet Capture Deep Dive

Indrabhushan Shukla
Google Cloud - Community
2 min readJul 1, 2024

By Indrabhushan Shukla and Prerna Bagga

A significant number of customers have reported experiencing heightened latency after integrating Anthos Service Mesh (ASM) into their Google Kubernetes Engine (GKE) clusters. To get to the bottom of this issue, we’ll guide you through capturing packets on the istio-ingress gateway, a crucial step in diagnosing the root cause.

Why Packet Captures on istio-ingress Gateway?

Packet capture allows us to examine the network traffic flowing through the istio-ingress gateway, providing insights into potential bottlenecks, delays, or misconfigurations that might be contributing to the observed latency.

Overcoming Privilege Limitations

By default, the istio-proxy container running within the ingressgateway pod lacks root privileges, which are required for packet capture tools. To work around this limitation, we’ll establish a temporary (ephemeral) container within the pod specifically equipped with the necessary privileges to capture packets effectively

Important Note: The following steps are valid for both in-cluster and Managed Control Plane configurations of ASM.

Step A : Identifying the Istio-Ingress Gateway Pod

  1. Run the following command, replacing namespace with the actual namespace where Istio is installed :

eg : kubectl get pod -n namespace

Step B : Adding the Debugging Container (tcpdump-container)

Once the istio-ingress gateway pod is identified, run the following command from your admin workstation (where you manage your GKE cluster). Replace istio-ingressgateway-5896f7f5c5-vcwkx with the actual name of your pod and namespace with the correct namespace:

eg: kubectl debug --image istio/base --target istio-proxy -it -c tcpdump-container istio-ingressgateway-5896f7f5c5-vcwkx

This command creates a debugging container named tcpdump-container attaches it to the istio-proxy container within your pod, and starts a tcpdump process to capture packets.

The -i/--interactive argument automatically attaches to the console of the ephemeral container, allowing you to interact with the tcpdump process and monitor the capture in real-time.

Step C: Capturing Packets

Once you are inside the container, run the following command to initiate packet capture:

root@istio-ingressgateway-5d55c5d8f4-kjqc2:/# tcpdump -w /tmp/ingressgw.pcap

This command will create a file named ingressgw.pcap in the /tmp directory within the container, where the captured packets will be stored.

Step D: Stopping Packet Capture

To stop the packet capture, press Ctrl+C. Do not exit the container yet.

Step E: Copying the Captured File

Open a new session in your console (e.g., gcloud console) and run the following command to copy the captured file out of the tcpdump-container:

kubectl cp namespace/istio-ingressgateway-5896f7f5c5-vcwkx:/tmp/ingressgw.pcap ingressgw.pcap

Replace namespace and istio-ingressgateway-5896f7f5c5-vcwkx with your actual namespace and pod name. This command will copy the ingressgw.pcap file from the container's /tmp directory to your current local directory.

Step F: Analyzing Packets

Now that you have the ingressgw.pcap file locally, you can analyze it using Wireshark or other packet analysis tools to identify the root cause of the latency issues in your GKE cluster with ASM.

--

--