Load Balancing Load Balancers
Why do we use load balancers?
Load balancings are usually implemented to improve availability and scale applications in and out.
If you’re just getting started with load balancing, when discussing availability the idea is that if a server behind the load balancer goes boom 💥, the load balancer will be smart enough not to send traffic to it. When we talk about scaling applications in and out, the goal is to abstract the client from the number of servers serving the application by using the load balancer’s URL.
What if the load balancer goes boom💥?
Now that our load balancer is capable of handling server failures, we have achieved high availability and nirvana, right? Nope.
When we added the load balancer between the client and the server, we removed the server as a single point of failure, but we added another one: the load balancer.
What are the different approaches?
Most load-balancing solutions like Fortinet, F5, or even pfSense use active-passive approaches. They do this by implementing virtual IP addresses (VIP). A VIP is an IP address that is assigned to the main server of a group of two servers(main and backup).
The problem with this approach is that while improving the availability of a service, it doesn't handle scaling in or out. The throughput capabilities of the system are limited by a single server.
If our implementation is hardware-based, this means calling our Fortinet or F5 sales rep and getting a couple of new appliances to replace our old and incapable ones. On the other hand, if our implementation is software-based (using software like HAProxy or Envoy) we can handle this by increasing the resources of the machines running the software.
Even though this seems like it would solve all our problems, vertical scaling can get expensive really fast: Network interfaces, switches, CPUs and memory can only go so fast.
In order to solve this problem for any scenario we need to scale out, we need to load balance the load balancers.
The magic solution?
In the past section, I purposefully ignored the most basic load-balancing method: IP Round Robin. This method involves adding multiple A records with different IPs. When this approach is implemented, the DNS server will answer with all the IP addresses that correspond to the requested record. In the end, it'll be up to the client to decide which IP address to use.
This sounds great! We can add all the load-balancing nodes we want and we can scale as much as we desire!
If one of those servers needs maintenance, its IP address will still be in the DNS response and clients will try to connect to it and fail.
What if?
So we got our load-balancing nodes with all their unique IP addresses, we can give the users all those IP addresses and they can use whichever one. The only missing part in our load-balancing challenge is removing the nodes’ records when they are not able to handle traffic.
There are lots of commercial and open-source tools that handle this by using some sort of service discovery. This requires applications to register themselves in a registry when they come up, and clients will query the register to get the addresses where the service lives. These tools are not designed for just adding some health checks to a DNS server which is what we need.
Our load balancer checks the health of the servers behind it, can’t DNS just check the health of the servers and always answer with healthy ones? The short answer is no, there are not a lot of solutions that do this easily.
Most cloud providers will have this functionality built into their DNS services, but what if I want to do this in my home lab or in my offices?
Introducing DisDNS
In order to simplify DNS administration for my home lab services I implemented a simple application that leverages CoreDNS and Etcd to manage what IPs should be in the response to a DNS query. The system is designed to run within a Kubernetes cluster but it can be run on normal servers since containers can run basically anywhere.
The system is fairly simple, configuration is handled in a YAML file where we define each record
zones:
lb.testdomain.lan:
healthcheck_type: http
healthcheck_port: 80
healthcheck_frequency_seconds: 5
entries:
- "1.1.1.1"
- "1.1.1.2"
healthcheck_config:
timeout: 2
protocol: "http"
verify: false
threshold: 4
accepted_status_codes: [200]
DisDNS will pick up this file and will check the two IP addresses in the “entries” array which are “1.1.1.1” and “1.1.1.2”. The healthcheck configuration specifies that the health of the service will be determined by sending an HTTP request every 5 seconds and that the HTTP response code should be 200.
DisDNS can also be configured with TCP and HTTPS checks.
With this configuration, DisDNS will use Etcd to store the status of each of the nodes. If a node fails the health check more than 4 times (it's configurable with the “threshold” value) its record will be removed from Etcd.
The DNS resolution part is handled by CoreDNS and the Etcd plugin which connects to our Etcd cluster and responds with the records that DisDNS puts there.
The following architecture diagram shows the different components that compose the system when deployed in Kubernetes.
The following diagram shows how the system handles failures
To wrap it all up
With DisDNS we eliminate single points of failure in our system. Let's review each component and how they achieve high availability:
Application Servers
Our application servers have their own IP addresses and will only receive traffic from load balancer nodes when healthy.
Load Balancers
The load-balancing nodes will receive traffic from clients and forward it to healthy application servers by leveraging their own native health checks.
DisDNS
The application implements a leader election so that any number of instances can be deployed and only one will be active at the same time. If the leader fails, another instance will take over usually in under 10 seconds.
Perhaps one day I’ll modify the application to have all instances active at the same time for more scalability.
CoreDNS
The CoreDNS application will have its configuration file mounted from a shared volume (a config map in a Kubernetes deployment), so it can run any number of instances to scale with demand.
Etcd
This datastore is a highly consistent key-value store that supports high availability with a writer node and leader election.