Load-balancing for K8s services in Coccoc Infrastructure

Published in

Coccoc Engineering Blog

5 min readJun 23, 2020

Preface

For a very long time, the infrastructure at Coccoc only supports traditional services running on physical hosts. Recently on the rise of microservices, the trendy container orchestration technology K8s has been adapted as a big part of the grand design. Therefore, the infrastructure also needs a huge shift to not only support conventional but containerized microservices as well. In this article we’re gonna go through how load-balancing is being implemented to publish K8s services from a very old-fashioned setup.

Load-balancing infrastructure

As depicted above, the primitive infrastructure for load-balancing architecture is straightforward:

4 LBs are Internet-facing servers which utilizes master-backup VRRP (Virtual Router Redundancy Protocol) providing high availability and load-balancing solution for the backend services. This case the 2nd server is master and taking the VIP (floating virtual IP), distributing load to 3 underlying backends
LB framework is implemented using keepalived, employing native Linux kernel module IPVS (Linux Virtual Server)
Balancing algorithm is WRR (weighted round robin), basically the 3 backends are taking the same weight

This setup works nicely with traditional services which run on physical hosts for a long time. However, recently we’ve been adapting Kubernetes to the infrastructure, and this LB setup has a remarkable limitation:

If using Pods’ IPs, they’re ephemeral, while IPVS configuration needs fixed IPs
Service’s IP can be static, but there’re bigger issue:
For using IPVS, the backend server also needs the VIP configured on its interfaces so it doesnt drop the packets forwarded to. While K8s’s IP is just a virtual one which lives inside calico’s network infrastructure (iptables), it’s not really an actual IP for real interface and so that it couldnt receive the packets forwarded from keepalived

In short, Real Server’s (RS) addresses couldn’t be k8s’s ips. There needs to be real physical backends.

So the problem becomes how backends connect to K8s services, or pods.

Traditional approach

The simplest publishing method is to make Nginx reverse proxy on each backends, proxy_pass’ing to k8s’s services using DNS-based service discovery (provided by CoreDNS). For example:

location / {
    proxy_pass "http://music.browser-prod.svc.cluster.local/";
}

For this approach:

Real servers’ ips are still backends’ ips, which is acceptable for keepalived
Load-balancing and high availability are now K8s service’s responsibility, controlled by DNS Round Robin (headless service) or iptables’s TCP load-balancing. Nginx proxy knows nothing about the underlying pods so it couldn’t handle load-balancing by itself

Besides, some tunings are necessary to make it really works:

DNS caching issue

Whenever services are redeployed, the IPs change (unless the services’ IPs are static). By default nginx caches domains for 60s, so it can only detect IP change after around 60s, which make service unavailable for the same amount of time! This is HUGE.

Solution: make the DNS cache TTL shorter:

resolver 127.0.0.1 valid=1s;

Remote Addr preservation issue

Client IPs go through L7 reverse proxy will not be preserved and changed to proxy’s IP. In several case we still need this information for rate limiting, ACL, monitoring…

Solution: use nginx’s real ip module, making use of X-Forwarded-For header:

set_real_ip_from  172.16.16.0/24;
real_ip_header    X-Forwarded-For;
real_ip_recursive on;

With the solutions to the corresponding issues, this is the simplest approach that works to make a K8s service published to the outside. Practically it works fine for most of the time, despite of some outstanding problems:

Load-balancing and high-availability is totally controlled by DNS RR or iptables
Due to the fact that the upstream for proxy_pass contains only 1 domain, there’ll be no retry from nginx if it marks the upstream as failure. Nginx only acts as a gateway for k8s services, not providing any high-availability or load-balancing mechanism
Misconfiguration of deployment’s readinessProbe makes requests sent to failed pods (and without retry as stated above), because service DNS doesnt remove it from IP pool
New dummy nginx configuration (only for proxy_pass) needs to be created on backend servers for every new k8s service

Nginx Ingress approach

This is a native K8s service discovery, which means the ingress controller can communicates directly with the pods, not through the service, hence load-balancing doesnt need to rely on K8s service anymore. Let’s see if it can fit in the on-premise infrastructure.

Firts, let’s get a brief description of what Ingress controller and Ingress (resource) are. We’re talking specifically about Nginx Ingress controller.

Ingress Controller

An nginx instance that utilizes Ingress resources as configuration
Native K8s service discovery without relying on DNS-based service discovery
Content-based routing support
TLS termination support

Ingress Resource

Configuration for Ingress controller

The Implementation

Under LBs, there are 2 physical servers acting as Nginx Ingress controller backends. Actually for each K8s namespace there’ll be different pair of ingress controller servers for better separation, but for the sake of simplicity we don’t go in to details here
Nginx ingress controller is a pod running on physical host with hostNetwork port 80 and 443

What we have with this approach:

For every exposing service, there’s only a need of one simple Ingress resource
Much importantly, nginx ingress controller supports retries when requests fail regardless of readinessProbe, and handle load-balancing by itself without dependancy on other service discovery. The ingress controller becomes real load-balancer, not just a simple proxy_pass machine as in previous implementation

For this setup, nginx controller as a native K8s load-balancer actually solves all the problems that we faced in the traditional approach. The load-balancing infrastructure now becomes the mix of conventional services and cloud native applications.

Closing thoughts

Adapting K8s is not an easy task, and everything from infrastructure side also needs to change accordingly. Load-balancing just plays a part in the big picture, besides monitoring, logging, storage…

There’re also several other solutions for K8s load-balancing such as envoy, contour, ambassador, haproxy, traefik… We chose nginx just because it is simple, familiar enough, and supported natively by K8s.

Nginx ingress controller is the bridge for publishing K8s services to the outside and it blends nicely in the heavily bare-metal focused infrastructure. However, nginx does support LB on Layer 7 only, which offers possibility for adapting other K8s’s Layer 4 load-balancer candidates in the future plan.