I like to think of myself as a bit of a tinkerer. I’m willing to be my own sysadmin, more than happy to learn and to put the time in to really understand how things are working together. I have had my home setup running on a small RPi cluster for a few years now, and the setup has been constantly evolving. Now, I’m finally in a position where everything seems to be managing itself, with a degree of self-healing. I’m going to share the core components with you today.
I am going to be writing a series of posts to describe how to get each component setup, and how to get the most out of it. I will also be sharing all the relevant parts of my setup so you can mirror it yourself as you wish.
The backbone of my entire cluster is k3s by Rancher. This is a lightweight distribution that runs relatively well on lower specced hardware, such as RPi’s. Up until recently, I had a 2 node cluster running just RPi 3B’s, and it (generally) handled it just fine. I started to push the limits of what I could run on those, so I’ve now got a new RPi 4 8GB in the mix, and now I have the computing power for days!
My ingress controller for the cluster is Traefik. What I love about this is that it does everything you need from both a reverse proxy and an ingress controller. I used it before Kubernetes to proxy to a bunch of docker containers. It handles all my SSL with LetsEncypt, integrating with the DNS challenge using Cloudflare, meaning I can get secure SSL certificates without exposing my services out to the internet. It's got an amazing UI that shows me exactly what is set up, and the status of it. It also got great middleware support allowing me to add a default header set to boost security, and also add in things like oAuth to any of my running systems. And the configuration for it is so straight forward!
MetalLB is a relatively new addition to my cluster. I use this to create externally facing load balancer services as entry points to my cluster. It works out of the box using the ARP protocol, basically meaning whichever node is the designated active entry point for a service advertises its mac as being the one that can service an IP. If that node goes down, the next one picks up the ARP publishing, and it works seamlessly. It means no matter where my services run, I can have a single static IP for my DNS, and my http(s) traffic.
One thing I need to be able to keep track of is how each node in my cluster is doing. To make those stats available across the cluster, I have Node-Exporter running on each RPi (outside of Kubernetes). This gives me an endpoint for Prometheus to consume giving realtime access to system metrics for health monitoring
Prometheus & Grafana
The last items that make up my core cluster are Prometheus and Grafana. I collect a lot of stats across the cluster — node health, pinhole stats, my home assistant states. This setup allows me to visualize these really easily, and to drill down if necessary to diagnose potential faults within the cluster. You can also set up a bunch of alerts through these, but I have yet to take advantage of that.