Node.js Multithreading!
Multithreading in Node.js using the Worker threads module!
Node.js used to be defined as a single-threaded asynchronous event-driven JavaScript runtime.
It was built to be a non-blocking I/O JavaScript runtime to build scalable network applications, so it uses the event-driven asynchronous paradigm instead of multithreading to achieve this goal.
What is the difference between multithreading and asynchrony?
- Multithreading: A single CPU core can handle multiple threads of execution concurrently.
- Asynchrony: Make events run separately from the application’s primary thread and notify it via signals when an event is completed or failed.
Is it helpful to use the multithreading paradigm in I/O-bound tasks?
Well, for network applications, having threads just waiting for an I/O task to complete could be more efficient because threads are resource-consuming, whether they are waiting or active.
Each thread uses a portion of the CPU, and when threads are waiting to perform I/O tasks, they are just wasting CPU time which otherwise would be used by threads that have actual CPU work to perform.
There is also an overhead to the overall application performance caused by the context switching done by the CPU when it switches from executing one thread to executing another.
The CPU needs to save the current thread's local data, application pointer, etc., and load the local data, application pointer etc.., of the following thread to execute.
And also, since threads can access shared data, this can lead to many concurrency issues, such as race conditions, deadlocks, or resource starvation.
Event-driven asynchronous I/O reduces the number of concurrent threads by removing the waiting ones, which increases the application’s scalability and leads to a more straightforward application design.
Thread-based networking is relatively inefficient and very difficult to use. Furthermore, users of Node.js are free from worries of dead-locking the process since there are no locks.
Almost no function in Node.js directly performs I/O, so the process never blocks. Because nothing blocks, scalable systems are very reasonable to develop in Node.js. — Node.js Documentation
Node.js is using threads behind the scenes! How?
Node.js has two types of threads:
- The one Event Loop thread (aka the main thread).
- The Worker Pool (aka threadpool) threads.
Node.js runs JavaScript code in the Event Loop (initialisation and callbacks) which is also responsible for fulfilling non-blocking asynchronous requests like network I/O.
As for Worker Pool threads which are responsible for offloading work for I/O APIs that can’t be done asynchronously at the OS level, as well as some particularly CPU-intensive APIs.
We have no control over Worker Pool threads as they are automatically created and managed using the C library libuv on which Node.js was built.
But what about CPU-intensive tasks that can’t be fulfilled using Worker Pool threads?
What if we have some code that performs synchronous CPU-intensive stuff, such as hashing every element in a vast array using the crypto
module?
const crypto = require('crypto');
app.get('/hash-array', (req, res) => {
const array = req.body.array; // large array
// a CPU-intensive task
for (const element of array) {
const hash = crypto.createHmac('sha256', 'secret')
.update(element)
.digest('hex');
console.log(hash);
}
});
In the above example, we have a block of code that takes a lot of computational time.
Since Node.js runs callbacks registered for events in the Event Loop, this callback code will block the Event Loop thread and be unable to handle requests from other clients until it finishes its execution.
Because Node handles many clients with few threads, if thread blocks handling one client’s request, then pending client requests may not get a turn until the thread finishes its callback or task.
The fair treatment of clients is thus the responsibility of your application. This means you shouldn’t do too much work for any client in any single callback or task. — Node.js Documentation
And here are some other examples of synchronous CPU-intensive tasks:
- ReDoS (Regular expression Denial of Service): Using a vulnerable regular expression.
- JSON DoS (JSON Denial of Service): Using large JSON objects in
JSON.parse
orJSON.stringify
. - Some synchronous Node.js APIs, such as
zlib.inflateSync
,fs.readFileSync
,child.execSync
, etc. - Some componential tasks such as sorting, searching, doing a linear algebra algorithm with
O(N^2)
complexity, etc.., through a significant amount of data.
Introducing Node.js Workers Threads
Node.js v12.11.0 has stabilised the worker_threads
module after it has been experimental for the last two versions.
Workers (threads) are useful for performing CPU-intensive JavaScript operations.
They will help a little with I/O-intensive work. Node.js’s built-in asynchronous I/O operations are more efficient than Workers can be. — Node.js Documentation
Let’s start with a simple example from the Node.js documentation to demonstrate how we can create Workers threads:
const { Worker, isMainThread } = require('worker_threads');
if (isMainThread) {
console.log('Inside Main Thread!');
// re-loads the current file inside a Worker instance.
new Worker(__filename);
} else {
console.log('Inside Worker Thread!');
console.log(isMainThread); // prints 'false'.
}
How Workers threads can communicate with their parent thread?
The message
event is emitted for any incoming message containing the input of port.postMessage()
that is used to send a JavaScript value to the receiving side of this channel.
Let’s see an example:
const { Worker, isMainThread, parentPort } = require('worker_threads');
if (isMainThread) {
const worker = new Worker(__filename);
// receive messages from the worker thread
worker.once('message', (message) => {
console.log(message + ' received from the worker thread!');
});
// send a ping message to the spawned worker thread
worker.postMessage('ping');
} else {
// when a ping message is received, send a pong message back.
parentPort.once('message', (message) => {
console.log(message + ' received from the parent thread!');
parentPort.postMessage('pong');
});
}
Internally, a Worker
has a built-in pair of the worker.MessagePort
s that are already associated with each other when the Worker
is created.
However, creating a custom messaging channel is encouraged over using the default global channel because it facilitates the separation of concerns.
Here is another example from the Node.js documentation that demonstrates creating a worker.MessageChannel
object to be used as the underlying communication channel between the two threads:
const assert = require('assert');
const { Worker, MessageChannel, MessagePort, isMainThread, parentPort } = require('worker_threads');
if (isMainThread) {
const worker = new Worker(__filename);
// create a channel in which further messages will be sent
const subChannel = new MessageChannel();
// send it through the pre-existing global channel
worker.postMessage({ hereIsYourPort: subChannel.port1 }, [subChannel.port1]);
// receive messages from the worker thread on the custom channel
subChannel.port2.on('message', (value) => {
console.log('received:', value);
});
} else {
// receive the custom channel info from the parent thread
parentPort.once('message', (value) => {
assert(value.hereIsYourPort instanceof MessagePort);
// send a message to the parent thread through the channel
value.hereIsYourPort.postMessage('the worker sent this');
value.hereIsYourPort.close();
});
}
Worker thread std channels
You can configure process.stderr
and process.stdout
to use synchronous writes to a file which leads to avoiding problems such as the unexpectedly interleaved output written with console.log()
or console.error()
or not written at all if process.exit()
is called before an asynchronous write completes.
worker.stderr
: Ifstderr: true
was not passed to theWorker
constructor, then data will be piped to the parent thread'sprocess.stderr
Duplex stream.worker.stdin
: Ifstdin: true
was passed to theWorker
constructor, then data written to this stream will be made available in the worker thread as aprocess.stdin
.worker.stdout
: Ifstdout: true
was not passed to theWorker
constructor, then data will be piped to the parent thread'sprocess.stdout
Duplex stream.
Let’s solve the problem we faced earlier.
We will spawn a worker thread to do the heavy task of hashing the array’s elements, and when it finishes execution, it will send the hashed array back to the main thread.
// server.js
const { Worker } = require('worker_threads');
app.get('/hash-array', (req, res) => {
const originalArray = req.body.array; // large array
// create a worker thread and pass to it the originalArray
const worker = new Worker('./worker.js', {
workerData: originalArray
});
// receive messages from the worker thread
worker.once('message', (hashedArray) => {
console.log('Received the hashedArray from the worker thread!');
// do anything with the received hashedArray
...
});
});
And in the same folder, let’s create a worker.js
file to write the Worker logic on it:
// worker.js
const { parentPort, workerData } = require('worker_threads');
const crypto = require('crypto');
const hashedArray = [];
// perform the CPU-intensive task here
for (const element of workerData) {
const hash = crypto.createHmac('sha256', 'secret')
.update(element)
.digest('hex');
hashedArray.push(hash);
}
// send the hashedArray to the parent thread
parentPort.postMessage(hashedArray);
process.exit()
By doing so, we avoid blocking the Event Loop, so it can serve other clients' requests, improving our application performance.
Conclusion
Performing the CPU-intensive synchronous tasks in worker threads and delegating only the I/O-intensive asynchronous tasks to the event loop can dramatically improve the performance of our Node.js applications.
Worker threads have isolated contexts, so we don’t have to worry about concurrency problems of the multithreading paradigm! However, worker threads can exchange information with their parent thread using a message-passing mechanism, simplifying communication.