Kubernetes on a High Traffic Environment: 3 Key Takeaways

Published in

upday devs

6 min readAug 26, 2020

Balloons Over Bavarian Inn // Photo by Aaron Burden

This post assumes technical expertise from the reader; with working knowledge on Amazon Web Services (AWS) and Kubernetes.

Introduction

upday started its Kubernetes journey on two smaller product lines with very light traffic last year. Armed with the confidence of running production workloads on Kubernetes, we then worked on migrating our primary product line, the upday platform.

This post summarizes the strategies we employed to handle the specific challenges we face at our scale.

A Quick Recap on our Architecture

A summary of the architecture can be read here. Although the underlying systems have evolved since that post, our core value proposition remains the same, but at a high level, i.e. we deliver personalised news content to our millions of users.

The upday platform currently handles 5k RPS during normal workloads and up-to 15k RPS during a news push notification. In some unique cases, depending on the importance of the news and time of the day, this RPS count can go exponentially higher. This is after a 80% cache hit ratio with our CDN on specific routes.

A Quick Recap on our Migration Strategy

We adopted Infrastructure Strangulation to migrate the applications to Kubernetes. The Strangler pattern is an established Cloud Architecture Design pattern documented by Martin Fowler and Microsoft.

*Strangler Pattern* in action. Source docs.microsoft.com.

Strangler pattern is about employing an incremental approach for replacing a legacy system with a new system. It is achieved by using a facade (called the Strangler Facade) which abstracts both, existing and new, implementations consistently.

We modified our deployment pipelines to run the applications on both Legacy and Kubernetes infrastructures. We also modified our Edge infrastructure to be able to route traffic to both Kubernetes and Legacy infrastructure. Most cutovers were handled at the DNS level. When we were confident about the outcome of the migration (i,e, application and functionality worked as expected, no reports of issues from users; zero alarms from monitoring; statsd metrics looking same as before, etc.), we simply stopped sending traffic to the Legacy infrastructure and decommissioned the same after a few days.

The takeaways

1. Load balancing for Latency

TL;DR — The default load balancing algorithm for Kubernetes ingress, round robin, causes the applications behind ingress to receive imbalanced traffic. The difference is very much visible when the traffic is high and bursty. The peak EWMA algorithm helps to get an even distribution of traffic.

In the modern software ecosystem, load balancing plays several roles; it enables Scalability and Resilience for the systems. In a distributed microservices architecture, we have to also consider another aspect — Latency.

Load balancing algorithms like round robin address the Scalability and Resilience aspects easily. For latency, the load balancer also has to account for the history of downstream application/server’s recent performance. One such load balancing algorithm that tracks this aspect is the Peak EWMA (Exponentially Weighted Moving Average).

Peak EWMA algorithm works by maintaining a moving average of each replica’s round-trip time, weighted by the number of outstanding requests, and distributing traffic to replicas where that cost function is smallest.

We have observed that Peak EWMA load balancing algorithm is pretty effective at distributing traffic across replicas especially when they auto-scale.

More information on this can be found at:

2. Local DNS Cache

TL;DR — In Cloud Environments, everything has limits. The KubeDNS component (CoreDNS), by default, resolves its upstream queries with the VPC DNS resolver from a limited number of nodes. It is very easy to hit the limits enforced by the resolver, thereby resulting in DNS lookup failures. Using a nodelocaldns will distribute the DNS resolution load across all the worker nodes.

DNS is one of the important pillars of any networked infrastructure. Kubernetes utilizes CoreDNS as its authoritative DNS server. When reaching a certain scale, a significant number of DNS errors started to appear in logs, that was leading to a lot of retries. Operations that were supposed to take a few minutes were taking orders of magnitude longer.

In our case, we were hitting the limits of DNS resolution imposed by the VPC resolver on a node. There are many ways to reduce the amount of DNS resolutions made, like adjusting the default DNS timeouts for OpenJDK or nodots fixes. Those fixes helped a bit, beyond those fixes, we still needed to find a way to handle more DNS traffic. Kubernetes developers have an add-on, NodeLocal DNSCache, that specifically addresses these kinds of problems.

DNS requests before and after implementing NodeLocal DNSCache.

NodeLocal DNSCache improves Cluster DNS performance by running a DNS caching agent on cluster nodes as a DaemonSet. Pods will be able to reach out to the DNS caching agent running on the same node, thereby avoiding iptables DNAT rules and connection tracking.

Implementing this reduced the volume of DNS queries made by a large extent. Burst traffic no longer affected DNS resolutions and this is in line with the AWS recommendation to cache DNS.

More information on this can be found at:

3. Multiple Ingress

TL;DR — Every product line has different SLA requirements and traffic patterns. In such cases, traffic can be isolated at ingress level by deploying multiple ingress controllers (with different ingress classes) for different applications or product lines.

In layman’s terms, Kubernetes ingress is used to expose Kubernetes services to external requests. Ingresses usually interface with a load balancer, to help handle the traffic.

We use ingress-nginx maintained by the Kubernetes team. It is built on top of the stable nginx open source reverse proxy and load balancer. During our migration, ingress-nginx was able to replace spring-api-gateway with ingress definitions and authenticate API requests with its auth-url annotation.

We have multiple product lines, some catering to B2C traffic and some with B2B traffic. Although they are separated under different namespaces, node groups and affinities, when the burst traffic in B2C product line started causing latency to the B2B products, it started affecting the SLA commitments as all of them used the same ingress controller. Therefore, we launched multiple ingress controllers with a different name in ingress-class to isolate the traffic.

By separating ingress controllers, we achieve isolation of traffic for that product line within the ingress, handle high latency during burst traffic scenarios and even containment of blast radius for that product line.

We now have a truly isolated setup for each product line, at the ingress layer too. Post this implementation, we have observed that traffic of one product doesn’t affect the other.

More information on this can be found at:

Conclusion

The current solutions are stable and are working consistently for our high traffic requirements. Further improvements can be made by separating to multiple clusters, rearchitecting our hot paths, caching etc.

We would love to read your ideas, questions or comments about this topic. Thank you for reading!

Kubernetes on a High Traffic Environment: 3 Key Takeaways

Introduction

A Quick Recap on our Architecture

A Quick Recap on our Migration Strategy

The takeaways

1. Load balancing for Latency

2. Local DNS Cache

3. Multiple Ingress

Conclusion

Written by Shyam Sankarraman