Bare-metal Kubernetes cluster with true load balancing

Till Klampaeckel
PlanetaryNetworks
Published in
4 min readDec 21, 2018
image credit: Fredrik Skarstedt

When we started exploring Kubernetes (or for some reason “k8s”) five months ago, I took it for a spin on one of the cloud providers (the one with G). Hosted Kubernetes on any of the well-known public cloud providers absorbs or hides some of the challenges people face on bare-metal though.

If I ignore storage for now (another blog post coming), the biggest challenge is: How do you route traffic to your cluster?

I’ve read the documentation and a lot of blog posts (in fact, too many to link) about Kubernetes and “Type: LoadBalancer”. Most suggest NodePort or HostPort setups or introduce an edge-router role, which creates yet another single point of failure in those setups.

HostPort — Your service is bound to a node (or maybe multiple) as the places where the pod can be scheduled become limited. The node(s) need(s) to be HA.

NodePort — Your service runs on every node (or, e.g. the “edge-router” role). Therefor, the port is just gone for multi-use, or it becomes increasingly complex to manage that. Add to that, if you have a loadbalancer or external DNS you need to take health checking into account.

ClusterIP — Internal only, forget about that.

LoadBalancer — Not that straight-forward on bare-metal. Well, we will get to that. :-)

So far, nothing new.

Enter MetalLB

Is it Star Trek? No, it’s MetalLB!

MetalLB is advertised to be the go-to solution to solve your load-balancing problems on bare-metal.

While the MetalLB project is still young, it uses concepts and protocols, which power the backbone of the Internet (or any network, I should say).

Believe it or not, in order to get to medium.com, you used the same protocols.

In a nutshell — MetalLB provides two modes, BGP and Layer2. There’s a great blog post on how BGP works so I won’t go into the details here. Since Layer2 mode also poses a few risks or general mis-understandings, I’ll share what we did to set it up.

Layer2 Mode

Layer2 mode uses the ARP protocol to make IPs known on the network.

ARP tells the your network where an IP address is located. Locations are MAC addresses. One location can have multiple IPs. In Layer2 mode, you install two services into your Kubernetes cluster:

  • a controller
  • “n” speaker nodes (usually, the amount of worker nodes)

A setup could be like this:

  1. Your network uses the following IP range:
    192.168.1.0–192.168.1.255
  2. Split that range into a DHCP pool (for example used for your k8s nodes, your laptop, phone, other devices etc.) and a pool for MetalLB to allocate IPs from.
  3. The result could be:
    DHCP: 192.168.1.2–192.168.1.200
    MetalLB: 192.168.1.201–192.168.1.255
    This provides you with the means to start close to 200 k8s nodes (or devices, excluding gateway), and then hand out another 54 IPs to services running on your k8s cluster.
    Take note: The segmented IP ranges should not conflict and MetalLB’s range should not be controlled by DHCP. The IPs (used by MetalLB) should not be assigned to your k8s nodes.
  4. Your k8s node setup could then look as follows:
    k8s-node1: 192.168.1.2
    k8s-node2: 192.168.1.3
    k8s-node3: 192.168.1.4

Setup

Follow the instructions in the official documentation to deploy MetalLB (that’s the “kubectl apply” bit) and then configure the ConfigMap with the above settings:

What happens next?

When you create a “type: LoadBalancer” service, MetalLB will assign an IP address from the provided pool (likely the first).

You can verify that with “kubectl”:

$ kubectl get svc/my-loadbalancer

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-loadbalancer LoadBalancer 10.43.79.61 192.168.1.201 80:32512/TCP,443:31899/TCP 1m

(Apologies for the weird paste.)

The following is done by MetalLB behind the scenes:

  • It assigns an IP to the service you created
  • The speaker nodes flood your local network with requests, broadcasting the location of the assigned IP (likely the MAC address of one of your nodes)
  • The speaker nodes also respond to requests made from the network with the location of the IP

TL;DR — Subsequent requests to the assigned IP now complete. :-) (Happiness!)

FIN

I hope this blog post provides insights into how MetalLB is supposed to work and help you be successful. Because MetalLB is awesome, and open source is awesome, we contributed some of these notes back to the project.

Also, while I used a range of private IPs in my example, the same works with a public range (e.g. your own class-c) as well.

Image Credit Fredrik Skarstedt.

--

--

Till Klampaeckel
PlanetaryNetworks

I’m Till. From Berlin. Mostly sarcastic about technology, but never the less interested in it.