One of the most important parts in server side development is to keep servers in a stable condition and not allow overloading operations happen as they can crash the servers.
In this article I will explain what “Health Checks” and “Overload Protection” is, also I will show solutions for some problems and some guidelines on how to do that.
Imagine you have 3 Node servers balanced with “Nginx” server, the load of your servers is divided equally, so if you have 600 users, then every server has 200 clients. But dividing equally does not mean that you are protected from overloads, because for every user your work can be different, for example, for user_1 you need to read 3 files but for user_2 you need to read 9 (3 times more) files. This is the real problem which is necessary to investigate and resolve. In this case, if your work is different and you will balance requests to the same server which is overloaded, then probably your server will crash.
For problems like this, I will explain “Health Checks” and “Overload Protection”.
The Load Balancer sends requests every n(i.e. 5 or 10) seconds to the server, to understand whether the server is able to handle more request or not, if yes, the server marks it as UP and continues to receive more requests from the balancer, otherwise the server marks it as DOWN and the balancer will not send any requests to this server until the balancer sends a health check request again and marks it as UP.
This process is called Health Check.
The request can be a simple Http(i.e. GET), Socket, or TCP request.
When the server receives a request, you can do some checks to understand if the server is able to handle more requests or not, and after that the server needs to respond to that request, in this case if you send status 200, it means everything is fine and the server can handle more requests, otherwise you can send status 503 SERVICE UNAVAILABLE which means the server is not able to handle more requests.
Example of Health Checks
Unfortunately, Open Source Nginx native does not support health checks, for that you need to install a module which is called nginx_upstream_check_module (This module is not distributed with the Nginx source).
ngx_http_healthcheck_module — sends a request to servers and if they respond with HTTP 200 + an optional request body, they are marked good. Otherwise, they are marked bad.
But I do not want to make it difficult.
Therefore, we can use Nginx alternative Load Balancer — HAProxy.
See the installation part here(you only need the “Installing HAProxy” part).
I am not going to explain all HAProxy because it will take more time and our point is not understanding HAProxy, we just need to understand how we can make simple health checking process. I will explain only important parts.
Here is a simple server which has two routes, one route for health checking and the other one for us.
Run using command
PORT=8000 node server_1.js
In Browser, also in console you can see the
PID number, which shows process id where Node server is running, so you can understand which node received your request after the balance in HAProxy.
And here is the configuration for HAProxy.
haproxy.cfg file and run HAProxy service.
Here you can see how to add the configuration file and start the service.
As you can see in
frontend tracker we create server and bind to 3000 port and in
backend trackers two servers with 8000 and 8001 ports which balancing in
rise (count): number of consecutive valid health checks before considering the server as UP. Default value is 2
fall (count): number of consecutive invalid health checks before considering the server as DOWN. Default value is 3
Make sure that server_1 is running.
Start HAProxy service.
Now you can see health check requests coming to server_1, when the server responds with consecutive 2 requests with status 200 then HAProxy will mark this server UP and will balance requests to this server, before that HAProxy server(http://localhost:3000) is unavailable(try to open server before 2 consecutive health check requests). After 2 consecutive responses you can see the result in browser(http://localhost:3000), now all the requests are going to server_1 because server_2(:8001) is not running.
Before running server_2, let’s look at the code and understand what it does.
This server will send responses with 200 status, and after 20 seconds statuses will change to 503. I think everything is simple and easy to understand here.
Let’s go to the most interesting part.
Now run server_2 using command
PORT=8001 node server_2.js
When two health check logs have passed in two servers you can open the browser(http://localhost:3000) and see how load balance works(refresh multiple times),
PID will be different.
After 20 seconds, when server_2 starts responding to the health check 503 status code, after first(as in config we have
fall 1) 503 response HAProxy will mark the server DOWN and stop balance requests to server_2 and the whole load will go to server_1.
HAProxy will try health check requests every 5 seconds and upon receiving 2 consecutive(as in config we have
rise 2) 200 status responses, the server will mark as UP and HAProxy will again balance requests to server_2.
To check whether the server is overloaded or not, and protect overloads you need to check some metrics, it depends also on your code logic and what you are doing, but here you can see generic metrics which are important to check.
- Event Loop delay
- Used Heap Memory
- Total Resident Set Size
Using overload-protection package you can specify limitations after which your server will not be able to handle more requests, and when the limit is passed, the package will automatically send 503 SERVICE UNAVAILABLE.
The package works with http, express, restify, and koa packages.
But if your Load Balancer is able to send Sockets for health checking and you want to do it with Sockets, then you need to use another package or build one by yourself.
In this article I have explained basic things on how Health Check works in HAProxy and how you can protect your server from overloads. Every server should have at least health check implementation like this because this is important for distributed systems.
Thank you for reading this article, feel free to ask any questions or tweet me @nairhar.
My article about “Graceful shutdown in NodeJS”
Other good sources:
NGINX HTTP Healt Checks
Performing Health Checks in HAProxy:
Using Kubernetes Health Checks: