Intermittent delays in Kubernetes

Monica Gangwar
Aug 5, 2019 · 5 min read

We moved most of our production workload to Kubernetes and were relishing the advantages it was offering us by keeping a lot of things abstracted and not requiring any manual intervention from us … But no sooner, all these abstractions became problematic when the whole network stack of our cluster went down. We were left with no other option than to create a new cluster and migrate all our applications to it.

With this cluster wide outage, our whole infrastructure came into scrutiny and we were left clueless as to why this happened, this was when we decided to take a deep dive into k8s networking and tried to understand each and every abstracted layer from pod-pod networking using CNIs to service networking using Iptables.

Also, during this time we noticed intermittent delays in pod to pod communication and intermittent overall slowness in our web application.

Problem statement

How to understand and avoid intermittent delays usually occurring in multiple of 5 seconds in pod to pod communication

Pod A sent request at x time but the request reached Pod B at x + 5s, x + 10s or x + 15s and the overall request took a lot longer than expected.

First step first

We noticed that delays are in the multiple of 5 seconds and so we found this extremely helpful article, which explains that this delay in pod-pod communication is due to DNS lookups.

libc library in our linux kernel (or musl in alpine) which is responsible for dns resolutions sends out dns lookups for A(ipv4 lookup) and AAAA(ipv6 lookup) record on the same port. A record is returned almost immediately whereas AAAA record timeouts after some while, causing the library to think that overall lookup has timedout. This causes libc to retry after waiting for 5 seconds. Thus causing dns resolution to be delayed by multiple of 5 seconds.

This delay is common in kernels where DNAT or SNAT translations are taking place and is caused by a race condition in conntrack, for understanding it in more depth follow this blog

Resolution

Linux

With the help of article mentioned above, we simply added single-request-reopen in our pod dns config and voila! our intermittent delays were gone. Just to be sure, we noticed tcpdump output and saw that for dns lookups — A and AAAA record lookups were going via different ports.

Alpine

We didn’t change anything for alpine images as most of our applications are in linux. Although you can follow this blog if you notice intermittent delays in alpine images

GRPC

For grpc applications we were still noticing DEADLINE_EXCEEDED errors when the timeout was set to 3 seconds. Dns lookup taking more than 3 seconds is a lot. So, we did these following steps:

  1. We added single-request-reopen to the pod spec and captured tcpdump output. We saw that lookups for A and AAAA records were going via same port
  2. We figured out that for grpc, default dns resolver is ares (if it’s available) and grpc was not using libc for it’s resolution hence our setting was not being taken into consideration
  3. Lucky for us, this dns resolver can be easily overridden by setting this variable GRPC_DNS_RESOLVER=native . We added this to our Dockerfiles and lo and behold! problem solved. We captured the tcpdump output and saw that lookups for A and AAAA records were going via different ports.

Conclusion

To avoid intermittent delays of 5,10,15 seconds or DEADLINE_EXCEEDED error in grpc applications:

  1. For linux add single-request-reopen to your deployment spec

2. For GRPC application in linux — add single-request-reopen in your deployment spec as mentioned above and set the following variable GRPC_DNS_RESOLVER=native

3. For alpine follow this blog.

PS: You can check tcpdump output using the following command:

Output when lookups for A and AAAA records are going via same port with retry after 5 second

Output when lookups for A and AAAA records are going via different ports

MindTickle

MindTickle is the world’s leading sales readiness platform…

Monica Gangwar

Written by

Full Stack Engineer @ Mindtickle | Ex-Mckinsey | DTU 2016

MindTickle

MindTickle is the world’s leading sales readiness platform that gives you the power to ramp up new reps faster, coach them effectively, keep them updated and create a culture of sales excellence. MindTickle is also home to one of the world’s most transparent and unique culture.

Monica Gangwar

Written by

Full Stack Engineer @ Mindtickle | Ex-Mckinsey | DTU 2016

MindTickle

MindTickle is the world’s leading sales readiness platform that gives you the power to ramp up new reps faster, coach them effectively, keep them updated and create a culture of sales excellence. MindTickle is also home to one of the world’s most transparent and unique culture.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store