Load Balancing and Reverse Proxying for Kubernetes Services

Different load balancing and reverse proxying strategies to use in Production K8s Deployments to expose services to outside traffic

Morning sunlight on Horton Plains National Park

Let’s define some terms!

Before we move on to the actual discussion, let’s define and agree on a few terms as they could be confusing with each use if not defined as a specific term first. Let’s define a sample deployment and refer to parts of this sample deployment later in the discussion.

  • K8s cluster — The boundary within which Pods, Services, and other K8s constructs are created and maintained. This includes K8s Master and the Nodes where the actual Pods are scheduled.
  • Node (with upper case N) — These are the K8s Nodes that all Pods, Volumes, and other computational constructs will be spawned in. They are not managed by K8s, and reside within the private network that is assigned to them by the IaaS provider when the virtual machines are spawned.
  • Node private network — The K8s Nodes reside in a private network in the Cloud Service Provider (e.g.: AWS VPC). For the sample deployment, let’s define this private network CIDR as 10.128.0.0/24 . There will be routing set up so that traffic is properly routed to and from this private network (e.g.: with the use of Internet and NAT Gateways).
  • Container Networking — The networking boundary within which the K8s constructs are created. This is a Software Defined Network, almost always implemented as Flannel. How routing works within this network could vary with different Cloud Service Providers, however generally every private IP assigned within this space can talk to each other without having to do NAT in between. This network is manipulated by kube-proxy whenever new Services and Pods are created. There are usually two CIDRs defined within this space, Pod Network CIDR and the Service Network CIDR.
  • Pod Network CIDR — The IP address range from which Pods spawned within the K8s cluster will get IP addresses from. For the sample deployment, let’s keep this to 10.244.0.0./16
  • Service Network CIDR — The IP address range from which the Services created within the K8s cluster will get IP addresses from. For the sample deployment, let’s keep this to 10.250.10.0/20
  • (K8s) Service (with upper case S) — The K8s Service construct that this article focuses on.
  • service — The application service that a particular set of Pods will provide
apiVersion: v1
kind: Service
metadata:
name: myservice
...
spec:
type: <type>
...
ports:
- name: myserviceport1
port: 8080
targetPort: 8080
protocol: TCP
...

The Problem

Pods are ephemeral.

In K8s, Pods are the basic unit of computation. However, Pods are not guaranteed to live for the whole life time that the user would intend it to be. In fact, the only contract that K8s adheres to when it comes to liveliness of Pods is that only the desired count of Pods (a.k.a. Replicas) would be maintained. They are ephemeral, and are vulnerable to kill signals from K8s during occasions such as scaling, memory or CPU over usage, rescheduling for more efficient resource use, or even to downtime because of outside factors (e.g.: the whole K8s Node going down). The IP address that is assigned to a Pod at its creation will not survive such events.

The Solution: Service Types

To address this problem, there are several Service type s that can be leveraged to allow ingress of external traffic to a Pod with a private IP address. Let’s explore each type and see if the goal to end up with a public IP address that a domain name can be mapped with can be achieved with any of them.

type: ClusterIP

Pros: None that is specific to this type

This is the default type of Service that would be created if the type field is not explicitly specified for a Service definition. A cluster-wide, but internal IP address, which is part of one of the Service network CIDR that is divided up when setting up Container Networking, will be provided as the fixed IP address of the Service. This IP address (and the Service name)is routable from anywhere within the K8s Overlay Network.

curl http://myservice:8080/
curl http://$(kubectl get svc myservice --template='{{.spec.clusterIP}}'):8080/
  1. Watch and detect new Service creations at the K8s Master — doable, easy
  2. Get the ClusterIP addresses assigned for each Service — doable, easy
  3. Create and update routing tables from all the possible clients up to the K8s Nodes to route the requests to the cluster network — not so much
  • None that is specific to this type
  • Not an easy task to expose the Services
  • Doing so with other workarounds would potentially expose the most possible surface for attacks

type: NodePort

As far as the order in which this article explores the Service types, this is the first that would allow us to expose the Services in a meaningful manner in a production deployment. And as it would be apparent later, this would be the basis of other mechanisms to map a Service port to a physical port.

curl http://10.128.0.13:31644/
  • Easiest method to expose internal Services to outside traffic
  • Enables greater freedom when it comes to setting up external load balancing and reverse proxying
  • Service-provided L3 load balancing functionality (e.g.: .spec.sessionAffinity) is available
  • Straight-forward, easy-to-untangle mechanism that would help when troubleshooting (especially with predefined NodePorts)
  • Have to manage load balancing and reverse proxying external from K8s management
  • Have to coordinate NodePorts among Services

type: LoadBalancer

Also known as the Cloud Load Balancer approach, specifying this as the .spec.type when K8s is deployed in a supported Cloud Service Provider would result in a load balancer in the Cloud to be provisioned that proxies the particular Service.

  • Provisioning load balancing and reverse proxying with minimum effort
  • NodePort management is done without the intervention of the user
  • Do not have to manage load balancing facilities outside of K8s domain
  • A single load balancer proxies for a single Service, and therefore is a costly approach
  • The implementation details are sometimes opaque and requires manual investigation to understand and troubleshoot
  • Sets up Network Load Balancers most of the time and therefore L7 features like path based routing and TLS/SSL termination are out of the table
  • Usually takes time for Cloud Service Provider to complete provisioning the load balancer

Bare Metal Service Load Balancer Pattern

Before K8s v1.1, Bare Metal Service Load Balancer was the preferred solution to tackle shortcomings of the above LoadBalancer Service type. This makes use of NodePorts and a set of HAProxy Pods that acts as a L7 reverse proxy for the rest of a sets of Pods. The solution is roughly the following.

  1. A single-container Pod contains an HAProxy deployment along with an additional binary called service_loadbalancer
  2. This Pod is deployed as a DaemonSet where only a single Pod is scheduled per Node
  3. The service_loadbalancer binary constantly watches the K8s API Server and retrieves the details of Services. With the use of Service Annotation metadata, each Service can indicate the load balancing details to be adopted by any third party load balancer (TLS/SSL Termination, virtual host names etc.)
  4. With the details retrieved, it rewrites the HAProxy configuration file, filling the backend and frontend section details with Pod IP addresses for each Service
  5. After the HAProxy configuration file is written, service_loadbalancer does a soft-reload on the HAProxy process.
  6. HAProxy exposes ports 80 and 443. These are then exposed to outside traffic as NodePorts
  7. The NodePorts can be exposed to outside through a public Load Balancer
  1. Traffic is routed to the public Load Balancer via the public IP address
  2. Load Balancer forwards the traffic to the NodePorts
  3. Once traffic reaches the NodePorts HAProxy starts L7 deconstruction and does host or path based routing based on the Service Annotation details
  4. Once the routing decision is taken, the traffic is directly forwarded to the Pod IP
  • Can make L7 decisions
  • Can make use of specific load balancer features such as cookie based Session Stickiness that may not be possible with Cloud Load Balancer approach
  • Has more control over how load balancing should be scaled
  • Load balancing details are managed with K8s constructs such as Service annotations
  • Is more customizable when it comes to different use cases
  • Economical since only one Cloud Load Balancer would be provisioned for a complete K8s cluster
  • Transparent configuration changes and mechanism since after the service_loadbalancer the changes involved are HAProxy specific
  • Changes are propagated quickly as the service_loadbalancer picks up the changes within a short interval period
  • Complex to set up and troubleshoot
  • Could result in a single point of failure if the number of Nodes or affinity specified Nodes is limited
  • service_loadbalancer only has support for HAProxy (although could support other reverse proxies in theory with considerable code changes)

Ingress

It is fair to say that the concept of Ingress and the associated Ingress Controller evolved out of the Bare Metal Service Load Balancer pattern discussed above. This approach is only available after K8s v1.1.

  1. Cloud Load Balancer approach most of the time provisions Network Load Balancers with no control of L7 constructs. Ingress does not do this. Its resulting load balancers are mostly L7.
  2. Cloud Load Balancer approach is dictated by the Service declaration. Ingress is a separate declaration that does not depend on the Service type.
  3. There would be one Cloud Service Provider load balancer per Service in the former approach, whereas with Ingress multiple Services and backends could be managed with a single load balancer.
  • Standardized approach to provisioning external load balancers
  • Support for HTTP and L7 features like path based routing
  • K8s managed load balancer behavior
  • Can manage multiple Services with a single load balancer
  • Can easily plugin different load balancer options
  • Ingress Controller implementations could be buggy (e.g.: I have had a fair share of GCP Ingress Controller not properly picking up readiness and liveliness probes to detect healthy backends)
  • Implementation control is in the hands of the Ingress Controller, which might restrict certain customizations that otherwise can be done with the Service Load Balancer pattern
  • K8s managed load balancer configuration could mean less control over the Cloud Service Provider load balancer with means that were available previously (perhaps in the older deployment architecture).

Gathering All Up Together

The above options are the basic ones available OOTB at the moment. However, many more patterns can be created by combining these approaches or custom implementations together.

Cost

What would be the infrastructure and operational cost of each approach? Would there be multiple instances of load balancers and associated resources (static IP addresses, storage, firewall rules etc) or would they be compressed to the workable minimum?

Complexity and Customizability

Does the approach reasonably explain the inner workings of the implementations? Are there multiple abstractions that lack adequate documentation and make the underlying implementation too opaque? How easy is it to troubleshoot the path between a client which is outside the K8s Pod network and a backend inside it?

Latency

How many routing hops should a request go through after entering the K8s Overlay Network before hitting the actual backend? Do these hops introduce considerable latency? Are implementations following proper load testing to identify saturation points?

Operational Transition

What is the degree of freedom allowed to modify resources created as part of the approach? Can the older processes and methods of infrastructure management be used with new approach for resources created outside the K8s realm? Or would Ops tasks be drastically disrupted?

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
chamila de alwis

developer, #cloud enthusiast, #apacheStratos committer, expect #linux, #containers, #kubernetes, #microservices, and #devops in general