Troubleshooting Network Latency in GKE Clusters with ASM: A Packet Capture Deep Dive
By Indrabhushan Shukla and Prerna Bagga
A significant number of customers have reported experiencing heightened latency after integrating Anthos Service Mesh (ASM) into their Google Kubernetes Engine (GKE) clusters. To get to the bottom of this issue, we’ll guide you through capturing packets on the istio-ingress gateway, a crucial step in diagnosing the root cause.
Why Packet Captures on istio-ingress Gateway?
Packet capture allows us to examine the network traffic flowing through the istio-ingress gateway, providing insights into potential bottlenecks, delays, or misconfigurations that might be contributing to the observed latency.
Overcoming Privilege Limitations
By default, the istio-proxy container running within the ingressgateway pod lacks root privileges, which are required for packet capture tools. To work around this limitation, we’ll establish a temporary (ephemeral) container within the pod specifically equipped with the necessary privileges to capture packets effectively
Important Note: The following steps are valid for both in-cluster and Managed Control Plane configurations of ASM.
Step A : Identifying the Istio-Ingress Gateway Pod
- Run the following command, replacing
namespace
with the actual namespace where Istio is installed :
eg : kubectl get pod -n namespace
Step B : Adding the Debugging Container (tcpdump-container)
Once the istio-ingress gateway pod is identified, run the following command from your admin workstation (where you manage your GKE cluster). Replace istio-ingressgateway-5896f7f5c5-vcwkx
with the actual name of your pod and namespace
with the correct namespace:
eg: kubectl debug --image istio/base --target istio-proxy -it -c tcpdump-container istio-ingressgateway-5896f7f5c5-vcwkx
This command creates a debugging container named tcpdump-container
attaches it to the istio-proxy
container within your pod, and starts a tcpdump
process to capture packets.
The -i/--interactive
argument automatically attaches to the console of the ephemeral container, allowing you to interact with the tcpdump
process and monitor the capture in real-time.
Step C: Capturing Packets
Once you are inside the container, run the following command to initiate packet capture:
root@istio-ingressgateway-5d55c5d8f4-kjqc2:/# tcpdump -w /tmp/ingressgw.pcap
This command will create a file named ingressgw.pcap
in the /tmp
directory within the container, where the captured packets will be stored.
Step D: Stopping Packet Capture
To stop the packet capture, press Ctrl+C
. Do not exit the container yet.
Step E: Copying the Captured File
Open a new session in your console (e.g., gcloud console) and run the following command to copy the captured file out of the tcpdump-container
:
kubectl cp namespace/istio-ingressgateway-5896f7f5c5-vcwkx:/tmp/ingressgw.pcap ingressgw.pcap
Replace namespace
and istio-ingressgateway-5896f7f5c5-vcwkx
with your actual namespace and pod name. This command will copy the ingressgw.pcap
file from the container's /tmp
directory to your current local directory.
Step F: Analyzing Packets
Now that you have the ingressgw.pcap
file locally, you can analyze it using Wireshark or other packet analysis tools to identify the root cause of the latency issues in your GKE cluster with ASM.