How to Optimize CPU-Intensive Work in Node.js

Graeme
5 min readJan 24, 2016

The purpose of this post is to familiarize the reader with launching external processes in Node. There is a GitHub readme with starter-code for convenience, though you should still follow the article for more context: https://github.com/graemeboy/nodejs-spawn-example

Node is designed for efficient I/O processes, but sometimes applications require more CPU intensive work, which may block the event-loop. Processes that block the event-loop can slow down the application for all users. Fortunately, Node can externalize these CPU-intensive processes, thus freeing the event-loop. Applications can do this by spawning spawn processes, which are children of the process that launched them.

Child processes and parent processes can communicate neatly back and forth, and parent processes can listen to and control (to some extent) their child processes.

Spawning Processes using UNIX Commands

As an introduction to spawning processes, we can look at executing simple, external UNIX commands from a Node application. We start by loading the child_process module:

var child_process = require('child_process');

This module contains an exec function, which can execute commands, and return the result in a callback:

exec(command, callback)

We can put simple UNIX commands in the command parameter, for example ls:

child_process.exec('ls', function (err, stdout, stderr){
if (err) {
console.log("child processes failed with error code: " +
err.code);
}
console.log(stdout);
});

Adding Options for Child Process

We can also add an additional options parameter before the callback function. This object can contain a number of options, including:

  1. cwd — forcibly change the current working directory
  2. encoding — expected encoding for output; defaults to ‘uft8’ for UTF-8 encoding. Node supports ‘ascii’, ‘utf8’, ‘ucs2’ and ‘base64’
  3. timeout — number of milliseconds to wait for the child process to finish executing
  4. maxBuffer — specify the maximum size of the std output stream
  5. killSignal — define which signal is sent to the child process if timeout is exceeded (default is SIGTERM, but there are dozens of UNIX signals that I will not copy down here)
  6. env — defaults to null, specifies environment variables for the child process to inherit, aside from those in the parent environment

Changing the Child Process Environment

It is worthwhile here to talk about defining environment variables for child processes. Changing process.env will alter the environment variables for all modules of the parent process, which is probably not the goal. Instead, to extend the environment variables for a child process, the environment of the parent should be duplicated, and then extended, as follows:

var env = process.env,
someVar,
envDup = {},
child_process = require('child_process');
// Duplicate the parent's environment object
for (someVar in env) {
envDup[someVar] = env[somevar];
}
// Now, extend this with some new variables:
envDup['VAR NAME'] = 'var value';
// Run child process with these environment variables
child_process.exec('ls',
{ env: envDup},
function (err, stdout, stderr) {
if (err) {
throw err;
}
console.log(stdout);
});

Note that we can make this shorter by requiring only the exec function:

var exec = require('child_process').exec;exec('ls', function (err, stdout, stderr) {
console.log(stdout);
});

A final thing to note about environment is that they are always strings. Therefore, even if you define an environment variable as a number, it will be accessed as a string in the child process, and this requires parsing it. E.g.:

In parent.js:

var exec = require('child_process').exec;
var env = { specialNumber = 13 };
exec('node child.js',
{ env: evn },
function (err, stdout, stderr) {
if (err) {
throw err;
}
console.log(stdout);
}
);

In child.js:

var specialNumber = process.env.specialNumber;console.log(typeof(specialNumber));
// -> "string"
console.log(typeof(parseInt(specialNumber, 10)));
// -> "number"

Run node parent.js to confirm this result.

Spawning and Monitoring Child Processes

In the previous examples, we took a look at how to launch external commands through the child_process.exec()function. The exec() function, however, provides no communication between the parent and the child, which is important for most applications that spawn such processes, to send signals or terminate those processes. Moreover, the output from exec() is buffered, rather than a readable stream, which imposes some other limitations. An improvement on this method is to use the child_process.spawn() function, as in the following example:

var spawn = require('child_process').spawn;// Create a child process
var child = spawn('tail',
['-f', '/var/log/system.log']);

In the above example, we run a tail command, passing in -f and ‘/var/log/system.log’ as arguments. The spawn function that we have used will return a ChildProcess object, from which we can access the process’s stdout stream. You can add a listener to this stream’s ‘data’ event, like so:

child.stdout.on('data', 
function (data) {
console.log('tail output: ' + data);
}
);

Whenever the child process outputs some data, it will emit this data event, which will be registered by the parent process. In this way, the parent process can monitor the child process.

The ChildProcess object also has a stderr stream, which you can listen to in just the same way:

child.stderr.on('data',
function (data) {
console.log('err data: ' + data);
}
);

Communicating to Child Processes

We can also send some data to our child process, using the ChildProcess object’s stdin stream. The child can listen to for data to this stream using process.stdin. However, it requires that this stream is resumed before it can be accessed (it is paused by default.) Here is an example:

In child.js:

// Unpause the stdin stream:
process.stdin.resume();
// Listen for incoming data:
process.stdin.on('data', function (data) {
console.log('Received data: ' + data);
});

In parent.js:

// Require spawn from the child process module
var spawn = require('child_process').spawn;
// Run node with the child.js file as an argument
var child = spawn('node', ['child.js']);
// Send data to the child process via its stdin stream
child.stdin.write("Hello there!");
// Listen for any response from the child:
child.stdout.on('data', function (data) {
console.log('We received a reply: ' + data);
});
// Listen for any errors:
child.stderr.on('data', function (data) {
console.log('There was an error: ' + data);
});

Listening for Exit Events

When a child processes exists, it emits an event that can be listened to by the parent. The callback for the exit event contains an exit code. For example:

var spawn = require('child_process').spawn;var child = spawn('ls');// Listen for stdout data
child.stdout.on('data', function (data) {
console.log("Got data from child: " + data);
});
// Listen for an exit event:
child.on('exit', function (exitCode) {
console.log("Child exited with code: " + exitCode);
});

Killing Child Processes

Nobody wants to write “killing child” in a sentence, but unfortunately naming conventions in CS are often coarse like this. In any case, we need to learn how parents can kill children [processes] in Node.

The simplest way to kill a child process is to use the child.kill method:

var spawn = require('child_process').spawn;// Make the child sleep for 3000 milliseconds
var child = spawn('sleep', ['3000']);
// Kill the child midway through its sleep.
setTimeout(function () {
child.kill();
}, 1500);

You can specify a range of signals to send while killing the child process, for example:

child.kill('SIGUSR2);

Child processes can override these signals, by listening and handling those events:

process.on('SIGUSER2', function () {
console.log("What doesn't kill me makes me stronger.");
});

The exception to this is the SIGKILL and SIGSTOP signals, which are handled by the operating system, and cannot be overridden by the child process.

Graeme is a software engineer who has been in web development since the early 2000s. He also writes for DigitalHarvest.com.

--

--