NGINX load balancer Cluster with automatic configuration and node failure detection using Serf

Amir Keshavarz
6 min readAug 8, 2021

--

Introduction

In one of my previous articles, I wrote about gossip protocol and memberlist and we also learned how to use the memberlist library in order to create a simple cluster for TCP health check purposes.

Now It’s time to learn about HashiCorp Serf. Serf is a software for Decentralized Cluster Membership, Failure Detection, and Orchestration based on the memberlist.

With Serf You’ll not need to code in Go or worry about the underlying layers of a decentralized cluster.

In this article, We’ll use Serf to create an NGINX LoadBalancer cluster which is automatically configured and detects node failures and removes failed nodes from the upstream list.

This tutorial presumes you have 3 nodes (1 Loadbalancer and 2 backend webservers) but It’s really up to you.

Prerequisites

  • 3 Linux servers
  • CentOS or Ubuntu installation on all 3
  • Root access

Step 1 — Install and configure NGINX

Step 1.1 — Installation

NGINX is required on all 3 servers. One of them is used as a load balancer and the remaining is used as normal web servers.

Simply run the following in your terminal:

Debian/Ubuntu:

sudo apt install nginx

Redhat/CentOS/Fedora:

sudo yum install nginx

After installation is over you need to enable and start the NGINX service:

sudo systemctl enable nginx
sudo systemctl start nginx

NGINX default page should now be visible on your local port 80.

Step 1.2 — Configuration

Find your nginx.conf file and configure your backend servers however you'd like them to be.

In your load balancer node find your nginx.conf file and create a file named backends.conf alongside nginx.conf. Then open nginx.conf and create an upstream block before the server block that includes the backends.conf.

upstream backend_hosts {
include backends.conf;
}

In the server, block create a location block and put the proxy configurations there:

location / {
proxy_pass http://backend_hosts;
}

We separated the backends.conf from the main config file so It'd be easier to add hosts to it from bash.

Step 2 — Install and configure HashiCorp Serf

Serf has an agent that you need to run on every node. We will use two concepts of Serf in this tutorial: Membership and Events.

Membership

You can probably guess what is this all about. membership keeps track of all our nodes in the cluster. It basically keeps a list of all alive nodes.

Events

This is how we can let the nodes know what’s going on in the cluster. There are multiple standard events in Serf like when a node joins or when one leaves. We can receive events on our nodes and trigger a bash script with provided input which includes the nodes involved in sending out the event. Some of the standard events are:

  • member-join - One or more members have joined the cluster.
  • member-leave - One or more members have gracefully left the cluster.
  • member-failed - One or more members have failed, meaning that they didn't properly respond to ping requests.

You can learn more about events and even custom events Serf documentation.

Step 2.1 — Installation

You can download a ready binary of Serf at serf.io The latest version of Serf is 0.8.2 at the time I'm writing this article.

Download the binary and put it in /usr/local/bin.

cd /tmp
wget https://releases.hashicorp.com/serf/0.8.2/serf_0.8.2_linux_amd64.zip
unzip serf_0.8.2_linux_amd64.zip
mv serf /usr/local/bin/serf

Verify your Serf installation by running it.

$ serf
usage: serf [--version] [--help] <command> [<args>]

Available commands are:
agent Runs a Serf agent
event Send a custom event through the Serf cluster
force-leave Forces a member of the cluster to enter the "left" state
info Provides debugging information for operators
join Tell Serf agent to join cluster
keygen Generates a new encryption key
keys Manipulate the internal encryption keyring used by Serf
leave Gracefully leaves the Serf cluster and shuts down
members Lists the members of a Serf cluster
monitor Stream logs from a Serf agent
query Send a query to the Serf cluster
reachability Test network reachability
tags Modify tags of a running Serf agent
version Prints the Serf version

Step 2.2 — Event Handlers

Event Handlers are just bash scripts that run when a particular event is received. In the next step, we learn how to configure the Serf agent to run an event handler for a particular event.

Since our cluster is very simple, We only need two event handlers. One for when a node joins and one for when on leaves (or fails).

Event Handler: Join

Create a file named serf_member_join.sh at the usr/local/bin/ containing this script:

if [ $SERF_TAG_ROLE != "lb" ]; then
echo "Not an lb. Ignoring member join."
exit 0
fi

while read line; do
ROLE=`echo $line | awk '{print $3 }'`
if [ $ROLE != "web" ]; then
echo "Not a webserver. Ignoring member join."
exit 0
fi

echo "$line" | awk '{ printf "server %s;\n", $2 }' >>/etc/nginx/backends.conf
done

systemctl reload nginx

Remember to replace /etc/nginx/backends.conf with your backends.conf file which might be in a different location.

Event Handler: Leave/failed

Like the previous script, create a file named serf_member_leave.sh and put in the following script:

if [ $SERF_TAG_ROLE != "lb" ]; then
echo "Not an lb. Ignoring member leave"
exit 0
fi

while read line; do
ROLE=`echo $line | awk '{print $3 }'`
if [ $ROLE != "web" ]; then
echo "Not a webserver. Ignoring member join."
exit 0
fi

IP_ADDRESS=`echo $line | awk '{print $2 }'`
sed -i "/${IP_ADDRESS}/d" /etc/nginx/backends.conf
done

systemctl reload nginx

Remember to replace /etc/nginx/backends.conf with your backends.conf file which might be in a different location.

What do these scripts do?

When an event handler gets executed, Serf provides a few useful inputs in order to be used in the script. Some of the most important environmental variables which Serf provides:

  • SERF_EVENT is the event type that is occurring. This will be one of member-join, member-leave, member-failed, member-update, member-reap, user, or query.
  • SERF_SELF_ROLE is the role of the node that is executing the event handler.
  • SERF_TAG_${TAG} is set for each tag the agent has. The tag name is upper-cased.

You can also use $line to read the raw event data as we did in our scripts. (In the case of membership events it’s something like: amir 192.168.31.156 lb role=lb)

You probably noticed that we exited from the script when a node is not a load balancer which makes sense because we don’t need any configuration in our backend nodes.

Then we read the line and parse the data. The role is checked to make sure we don’t add the load balancers into the configuration!

The next step is to add the host intro the backends.conf when nodes join and remove them when they leave or simply fail to respond to the cluster.

There is so much more you can do with Event Handlers and I recommend reading the Serf manual after this article.

Step 2.3 — Serf Agent

Now that We’re done with event handlers, We can finally run the Serf agent.

Run the following on the load balancer:

serf agent \\
-event-handler "member-join=/usr/local/bin/serf_member_join.sh" \\
-event-handler "member-leave,member-failed=/usr/local/bin/serf_member_leave.sh" \\
-tag role=lb

And on the webservers:

serf agent -tag role=web

Note that we can tag the nodes in our cluster. We use this to find out what is the role of each node in our cluster.

The final step is to connect them all! You can use serf join <ip_address> to connect agents together.

serf join <loadbalancer_ip>

Since our cluster is completely decentralized you can easily add load balancers and webservers by only knowing one IP address in the cluster.

After running the agents and joining the cluster, your backends.conf in the load balancer should contain the backends nodes. By running serf leave on webservers you can leave the cluster and your load balancer will automatically reconfigure itself :)

Conclusion

In this tutorial, we made a cluster of NGINX load balancers and webservers that can automatically reconfigure themself without intervention.

Since this tutorial only explained a very simple cluster, There are a few things you can add to make it better for production environments:

  • Use Serf Custom Queries to query the nodes about their state.
  • Create a startup script for load balancers that gets the previously added web server nodes so It doesn’t miss any node.
  • Create services so Serf always is running.
  • Use tags to define weights for the load balancer.

I hope you enjoyed reading this tutorial :)

Links

--

--