Clustering with Node.js
Some weeks ago, I was searching how to implement a cluster for a redundant task which dealt with a lot of data.
A lot a solutions exist to make a Node.js cluster which will spawn process. But they are only a few examples to share the load on different threads.
To set up a cluster of threads with Node.js, you will need:
- A server or more with multiple cores (duhhhhh)
- A script for the task (duhhhhhhhhh)
- Your brain
First, we need to know the numbers of available cores on your server and create as many forks as cores you can use, while leaving one core available for the main process and to prevent throttling.
Then we just need to spawn as many processes as available cores with the cluster library from Node.js.
The problem with
cluster.fork() is that it clones the current thread to spawn the new thread. This is why I use
cluster.isMaster to detect if we are on the main thread or not.
Now that we have our master and its forks, we need to make them talk to each others.
Each cluster worker has a
send method allowing the master thread to send messages to every clusters. As for the spawned process, you can use the
process.on method to listen to the
Even if the master calls all the workers in their CPU order, it may be possible that the answers will not be in that order, because of the core used for an other task for example.
If you want your workers to send a message to the master, you just have to send a message from the process with
process.send and listen to the
message event on each workers.
And now you now how to communicate with your workers.
Shutdown your cluster
If you make a headless version of your main script and the master stop, you will be surprised to see that all forks keep running if you try to kill the main process.
To prevent this, you just need to listen to the
SIGINTevent and force all the forks to stop.
It’s always possible that, one day, one of your fork will crash. Not fun if you distribute a task on all forks, some data may not be processed.
We can implement a simple way to check if a fork dies and then relaunch it to prevent further problems.
Be careful: this script will relaunch any forks which crashed or have been stopped by the main process. So if you stop the app, forks will be killed and relaunched right after.
Here’s an interesting part: how to split a task between all forks.
If you need to process thousands of rows of data, one thread will take too much time, but with multiple thread it will be faster.
Cluster as a service