Node worker threads

Liron Navon
Jan 21, 2019 · 5 min read

Worker threads are already available from version 10.5, but now at version 11.7, the feature flag is removed and we can start using them.

What are worker threads?

Node is known as an asynchronous programming runtime for javascript, in node we work in a concurrent way (with promises and callbacks), but here we will understand how to work in a parallel way, and why and when we should even do it, you can read more about the difference between concurrency and parallelism here.

Image for post
Image for post

When to use worker threads?

Workers (threads) are useful for performing CPU-intensive JavaScript operations. They will not help much with I/O-intensive work. Node.js’s built-in asynchronous I/O operations are more efficient than Workers can be.

Let's get started!

We are going to do a time consuming heavy CPU operation — sorting a very large array, millions of elements. The worker_threads API exports a few variables and functions, here is a shot, simple description for each:

Worker (function): A class to create a new worker thread, it accepts a path to a script, and options object that includes a “workerData” variable, it should be called to spawn a new thread.

workerData (object): This variable will contain the data that is passed to the worker thread through the constructor of “Worker”, we can use the instance to emit messages into the worker, or listen to events from it, kill and restart it.

isMainThread (boolean): A variable that tells us if we are on the main thread, you can use it to make decisions from the workers (remember that a worker thread can spawn another worker thread, which means both will not be in the main thread.)

parentPort (object): The child thread can use this object to emit messages to the parent thread.

Ok, now that we know what API the worker threads have, we can see an example, the example is well documented.

index.js is in our main thread.

sorter.js is a script made to work with the worker_threads API as a worker.

You can run it with: node index.js , our worker can now sort a simple array, but in order to see the difference, we need to scale this operation across multiple workers.

Let’s see the difference

npm init
npm install ora --save

Here is a way to compare the speed of sorting millions of random numbers (change the elements variable to the wanted number, 5million is a nice start) with one worker, 8(I have 8 CPU's on my mac) workers, and with no worker at all.
I had to increase the memory limit in order to handle so many floating points numbers (it started to crush above 10 million elements), you can increase the memory by applying the flag --max-old-space-size and specifying the size in megabytes (8gb in this case).

node --max-old-space-size=8192 index.js

This is the output I got:

# for 5 million items:
sorted 5000000 items, with 1 worker in 11296ms (11.2 seconds)
sorted 5000000 items, without workers in 7045ms (7 seconds)
sorted 5000000 items, with 8 workers in 2049ms (2 seconds)
# for 20 million items
sorted 20000000 items, with 1 worker in 63597ms (63.5 seconds)
sorted 20000000 items, without workers in 48189ms (48 seconds)
sorted 20000000 items, with 8 workers in 8747ms (8.7 seconds)

Well, this is a big difference, please notice that when passing information to the threads and back we have a large overhead of serializing and deserializing the data — which causes a large overhead to performance.
They don't block the main thread, of course, Promises can work just as well at keeping the main thread responsive, but a lot of promises running at the same time on the same thread may cause it to be slow and unresponsive, and distributing the load over a bunch of workers can save us a lot of precious time.

When working in a real-world scenario you should manage your own pool of workers and manage them so the number of running threads will not exceed the number of CPU’s, we can achieve this with worker-thread-pool, we only need to change out “sortArrayWithWorker” function, and initiate the thread pool like so:

const Pool = require("worker-threads-pool");
const pool = new Pool({ max: cpuCount });
const sortArrayWithWorker = arr => {
return new Promise((resolve, reject) => {
pool.acquire(workerScript, { workerData: arr }, (err, worker) => {
if (err) {
return reject(err);
}
worker.once("message", resolve);
worker.once("error", reject);
});
});
};

Now we can safely use more processors than there are available without fearing running out of processors (the pool will manage them for us and keep a steady number of threads running).

const result = await distributeLoadAcrossWorkers(100);

Running the threads like so will actually make them slower, before, running the 5milion example with 8 real threads took 2049 milliseconds, now with “100” threads took me 4302 milliseconds — still much faster than running it with no worker threads at all, as we increase the number, the process will be slower since the pool will have to handle more instances.

Conclusion

Lazy Engineering

An engineering blog for lazy people

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store