How to setup a highly available load balancer with keepalived and HAProxy on Ubuntu 18.04

Michele Pangrazzi
6 min readJul 8, 2019

--

At Wonderflow we’re handling more and more data during our customer feedback analysis processes.

We wanted to add data redundancy and high availability to our infrastructure, so battle-tested and very flexible load balancers like HAProxy are a must to use.

However, having only a single instance of HAProxy in front of all infrastructure means also that it’s a single point of failure.

Fortunately, we’ve been able to get rid of this SPOF using keepalived, HAProxy and our infrastructure provider APIs (OVH).

About our infrastructure

We are using OVH Dedicated Servers which seems to be a good compromise in terms of performance and reliability, especially when you need to make MongoDB run very fast on bare metal.

There’s no need to dive in into our infrastructure configuration, the only thing you need to know is that we’ve provisioned two servers with HAProxy instances configured. I’ll call them lb1 and lb2.

Network-wise, both lb1 and lb2 are using OVH vRack feature, so they are able to communicate between each other on a private network via a vLAN.

About the floating IP

On OVH (but also on other providers) you can easily buy ad additional IP address which will be your floating IP address.

A floating IP address is a virtual IP address which could be routed to one of your services (i.e. dedicated servers) both manually (using OVH Control Panel) or automatically (using OVH APIs).

Remember to ensure that the virtual IP address is assigned as an alias to both lb1 and lb2 servers. You easily assign that IP as an alias to the public network interface using netplan on Ubuntu 18.04:

$ sudo vi /etc/systemd/network/50-default.network# Add this line under [Network] section
Address={FLOATING_IP}/32
$ sudo systemctl restart systemd-networkd# Test
ping {FLOATING_IP}

Build and Install keepalived

We chose to download and compile a recent version of keepalived (2.0.15). The one present on Ubuntu repositories is a bit too old (1.3.9), and new versions have a lot of improvements.

If you want to deep dive into keepalived, I suggest to read its official documentation. It’s very well done.

First, you need to install some requirements:

$ apt install libssl-dev build-essential

Then download and extract keepalived sources:

$ wget https://www.keepalived.org/software/keepalived-2.0.15.tar.gz
$ tar xvzf keepalived-2.0.15.tar.gz

Then enter to extracted folder and build keepalived:

$ cd keepalived-2.0.15
$ ./configure
$ make && make install

Install keepalived.service

Ubuntu 18.04 is based on systemd, but fortunately you can find a ready-to-go keepalived.service in the sources.

So you need to copy it to the services location and create the symbolic link to multi-user.target.wants:

cp \
keepalived-2.0.15/keepalived/keepalived.service \
/etc/systemd/system/
ln -s \
/etc/systemd/system/keepalived.service \
/etc/systemd/system/multi-user.target.wants/keepalived.service

Now you can control keepalived with service command:

$ service keepalived status

Configuring Keepalived

Now that keepalived is installed, we need to configure it. Of course the configuration is different (almost specular) between lb1 (MASTER) and lb2 (BACKUP).

On both lb1 and lb2 instances, the configuration is stored on /etc/keepalived/keepalived.conf.

lb1 configuration is:

global_defs {
enable_script_security
script_user node
}
vrrp_script chk_haproxy {
script "/usr/bin/pkill -0 haproxy"
interval 2
weight 2
}
vrrp_instance VI_1 {
interface vlan.99
state MASTER
priority 101
virtual_router_id 42
unicast_src_ip {{ lb1_ip_address }}
unicast_peer {
{{ lb2_ip_address }}
}
authentication {
auth_type PASS
auth_pass AaP51Mdi
}
track_script {
chk_haproxy
}
notify_master /etc/keepalived/master.sh
}

lb2 configuration is:

global_defs {
enable_script_security
script_user node
}
vrrp_script chk_haproxy {
script "/usr/bin/pkill -0 haproxy"
interval 2
weight 2
}
vrrp_instance VI_1 {
interface vlan.99
state BACKUP
priority 100
virtual_router_id 42
unicast_src_ip {{ lb2_ip_address }}
unicast_peer {
{{ lb1_ip_address }}
}
authentication {
auth_type PASS
auth_pass AaP51Mdi
}
track_script {
chk_haproxy
}
notify_master /etc/keepalived/master.sh
}

On global_defs, enable_script_security is active and the user which will run the script is node. Remember to avoid running scripts as root.

The track_script chk_haproxy executes /usr/bin/pkill -0 haproxy to check if HAProxy instance is still alive on server. There are a lot of different ways to check it, e.g. killall -0 haproxy and so on. Behaviour might be slightly different between systems, I found the check with pkill to be reliable enough.

After configuration, on both instances you need to restart keepalived issuing:

service keepalived restart

About BACKUP to MASTER switch and vice-versa

Given our configuration, where:

  • lb1 is MASTER
  • lb2 is BACKUP

The priority of the MASTER is 101 and the priority of the BACKUP is 100.

The check script has a weight of 2 and it is executed every 2 seconds. When the execution is successful (exit with code 0), the priority of the instance is increased exactly by the assigned weight of the script.

So at startup lb1 run successfully chk_haproxy script and become MASTER with priority 101+2 = 103.

If for some reasons the MASTER goes down and/or the chk_haproxy script fails, the BACKUP will be notified (via VRRP protocol). Its priority will change to 100+2 = 102 so since the MASTER is not available anymore it will switch from BACKUP state to MASTER state.

When one between lb1 or lb2 become a MASTER, the notify_master script will be executed (in our case /etc/keepalived/master.sh, see below for more info about it).

Note that if at some point lb1 is up again, its priority will be again 103, so it will become again the MASTER. This is intentional because we wanted lb1 to be the “preferred” MASTER and lb2 to be the BACKUP.

If you don’t want this behaviour, you can simply set the same priority on both MASTER and BACKUP, so the one that become a MASTER will remain MASTER, even if the previous MASTER (now BACKUP) will be available again.

The notify_master script

The /etc/keepalived/master.sh script is where the “magic” happens: here we need to make sure that the floating IP is pointing to the current MASTER instance.

There are really a lot of ways to do this, depending on your architecture and on your cloud provider.

In our case, we need to call an OVH API: POST /ip/{ip}/move.
This API will basically move an ip to a specific service (i.e. a dedicated server).

This is the master.sh script:

#!/bin/bash# Ensure n/node is on PATH
PATH=$PATH:/home/node/n/bin
# Log everything on syslog
exec 1> >(logger -s -t $(basename $0)) 2>&1
HAS_FLOATING_IP=$(ovh-cli ip check ovh–service-xx.eu 1.2.3.4)
echo “HAS_FLOATING_IP (1): $HAS_FLOATING_IP”
if [ $HAS_FLOATING_IP == “false” ]; then
n=0
while [ $n -lt 3 ]
do
echo “Try to assign floating ip ($n)”
ovh-cli ip assign ovh–service-xx.eu 1.2.3.4 && break
n=$((n+1))
sleep 3
done
echo “OK”
fi

ovh-cli is a simple global CLI we’ve built around node-ovh which basically does two things:

  • check calling GET /ip/{ip} if the current machine (ovh-service-xx.eu) has the floating IP assigned (1.2.3.4)
  • assign it (if needed) calling POST /ip/{ip}/move. We’re doing it n times (n = 3 in this case) to be 100% sure that everything is ok.

If one between assignorcheckcall fails, ovh-cli will send us a notification using GetSentry, so we can do some manual (or automatic) recovery actions.

To simplify debugging, this script is logging everything it prints on syslog using this trick:

exec 1> >(logger -s -t $(basename $0)) 2>&1

It’s very useful to monitor script execution simply looking at syslog, since by default keepalived is logging so you will see both script and keepalived logs in one place.

Note that you can also use different wrappers (and so different languages) to interface from a bash script to OVH API (or you can build your own).

If you’re using another provider, the master.sh script should be very similar. The interface with their API may be a bit different, but the things to do are pretty much the same.

Testing time!

Now that keepalived is ready and configured, you can test it.

To do it, tail -f (or less +F) the syslog on both machines. Then launch on your laptop something like:

while true; do sleep 1; curl -v 'http://yourapp.com'; done

Now, try to stop HAProxy on the MASTER instance issuing service haproxy stop. You can also reboot the machine, or definitely kill it running rm -rf / as root.

The curl script on your laptop should start giving connection refused errors.

On the syslog of the BACKUP, you should see the notification of keepalived saying that the instance is about to become the MASTER.

So when the master.sh is executed -and so the floating IP is correctly routed to the new MASTER- curl script should work again.

The load balancer is no more a SPOF!

About OVH IP routing time

After some testing we’ve seen that OVH takes from 20–30 seconds up to 1 minute to update the route of the floating IP. On other providers it may be faster, or slower.

--

--

Michele Pangrazzi

Software Engineer. Mostly Javascript and Python. Node.js addict.