Clustering with Node.js

Some weeks ago, I was searching how to implement a cluster for a redundant task which dealt with a lot of data.

A lot a solutions exist to make a Node.js cluster which will spawn process. But they are only a few examples to share the load on different threads.

Requirements

To set up a cluster of threads with Node.js, you will need:

  • A server or more with multiple cores (duhhhhh)
  • A script for the task (duhhhhhhhhh)
  • Your brain

Setup

First, we need to know the numbers of available cores on your server and create as many forks as cores you can use, while leaving one core available for the main process and to prevent throttling.

Then we just need to spawn as many processes as available cores with the cluster library from Node.js.

The problem with cluster.fork() is that it clones the current thread to spawn the new thread. This is why I use cluster.isMaster to detect if we are on the main thread or not.

Communication

Now that we have our master and its forks, we need to make them talk to each others.

Each cluster worker has a send method allowing the master thread to send messages to every clusters. As for the spawned process, you can use the process.on method to listen to the message event.

Even if the master calls all the workers in their CPU order, it may be possible that the answers will not be in that order, because of the core used for an other task for example.

If you want your workers to send a message to the master, you just have to send a message from the process with process.send and listen to the message event on each workers.

And now you now how to communicate with your workers.

Shutdown your cluster

If you make a headless version of your main script and the master stop, you will be surprised to see that all forks keep running if you try to kill the main process.

To prevent this, you just need to listen to the SIGINTevent and force all the forks to stop.

Relaunch

It’s always possible that, one day, one of your fork will crash. Not fun if you distribute a task on all forks, some data may not be processed.

We can implement a simple way to check if a fork dies and then relaunch it to prevent further problems.

Be careful: this script will relaunch any forks which crashed or have been stopped by the main process. So if you stop the app, forks will be killed and relaunched right after.

Split task

Here’s an interesting part: how to split a task between all forks.

If you need to process thousands of rows of data, one thread will take too much time, but with multiple thread it will be faster.

Cluster as a service

For this part, I recommand you to read the article written by Kvz: Run Node.js as a Service on Ubuntu. It’s a really good article to create an Upstart script.

Have fun!