Parallelizing Your JavaScript Code: An Introduction to parallelizer-function

--

Fig 1. A thread pool mechanims

JavaScript is a single-threaded programming language, meaning that it can only execute one task at a time. This can become a problem when running heavy, long-running tasks, such as image processing or large data analysis, as it can cause the main thread to freeze and the application to become unresponsive.

To solve this problem, the JavaScript ecosystem has developed a technique called threading. Threading allows you to run tasks in a separate thread, so that they do not block the main thread and affect the performance of your application.

In this article, we will be introducing an npm package called parallelizer-function, which allows you to run JavaScript functions in a different thread. The package uses the Worker API, which works for both browsers and Node.js.

What is parallelizer-function?

parallelizer-function is an npm package that allows you to run JavaScript functions in separate threads, using the Worker API. It works for both browsers and NodeJs. Based on the run environment, it uses the Nodejs build-in library “worker_threads” or the default window.Worker class in case of a browser environment.

Installation

To install parallelizer-function, you can use npm or yarn:

npm i parallelizer-function --save

or using yarn

yarn add parallelizer-function

Usage

The package has two main core parts: the function workerPromise and the Pool class. The workerPromise function allows you to execute a function in a separate thread, and the Pool class implements a thread pool that runs every task with a set predefined max number of threads or workers. It implements an async queue to execute the function once an available worker has finished its current work.

workerPromise function

The workerPromise function is a simple function that takes two arguments, the function to be executed and an array of arguments for that function. It returns a promise that resolves to the result of the function or rejects with an error.

const { workerPromise } = require("parallelizer-function");

const longRunningTask = (n) => {
let result = 0;
for (let i = 0; i < n; i++) {
result += i;
}
return result;
}

async function main() {
// workerPromise(fn: (...params: any[]) => any, args: any[] = []): Promise<any>
workerPromise(longRunningTask,[1_000_000])
.then((res)=>{console.log(res)})
.catch((error)=>{console.error(error)})

/// or using try-catch
try{
let res = await workerPromise(longRunningTask,[1_000_000]);
console.log(res);
}catch(error){
console.error(error)
}
}

In this example, we’re running the longRunningTask function in a separate thread, which means that it won’t block the main thread. This can greatly improve the performance of your application, especially if you have several long-running tasks that need to be executed simultaneously.

Pool class: Limiting the number of concurrent threads

The Pool class provides a way to limit the number of concurrent threads, so that your application does not overload the system. This is useful for running long-running tasks, such as image processing, and running a large number of tasks concurrently.

const { Pool, pool } = require('parallelizer-function');

// It you want to maintain the global state of the pool use the object pool, but you can define a new Object: const newPool = new Pool(2)

pool.setMaxWorkers(2); // Set the maximum number of worker threads for the global object pool

const longRunningTask = (n) => {
let result = 0;
for (let i = 0; i < n; i++) {
result += i;
}
return result;
};

const heavyImageProcessing = (imageData: ImageData) => {
for (let i = 0; i < imageData.data.length; i += 4) {
// This is a heavy operation that will block the main thread
imageData.data[i] = imageData.data[i] * 2;
imageData.data[i + 1] = imageData.data[i + 1] * 2;
imageData.data[i + 2] = imageData.data[i + 2] * 2;
}
return imageData;
};

async function main() {
try {
//pool.exec(fn: (...params: any[]) => any, args: any[]): Promise<any>
let res = await pool.exec(longRunningTask, [1_000_000]);
console.log(res);
} catch (error) {
console.error(error);
}

/// You can use Promise.all to run several tasks

try {
Promise.all([
pool.exec(longRunningTask, [1_000_000]),
pool.exec(longRunningTask, [2_000_000]),
pool.exec(heavyImageProcessing, [imageData]),
]).then((res) => {
console.log(res);
});
console.log(
'This function will be executed very fast, the above heavy tasks will not block the main thread'
);
} catch (error) {
console.error(error);
}
}

The first thing we do is import the Pool class and the pool object. The pool object is a global instance of the Pool class, and it can be used to maintain a global state of the pool. However, you can also create a new instance of the Pool class by doing const newPool = new Pool(2).

The next thing we do is set the maximum number of worker threads for the global object pool using the setMaxWorkers(2) method. This means that we're only allowing 2 worker threads to run concurrently.

We then define two functions, longRunningTask and heavyImageProcessing, that simulate long-running tasks. The longRunningTask function simply performs a calculation for a given number of iterations, while the heavyImageProcessing function performs a heavy operation on an image.

In the main function, we're using the pool.exec() method to run our longRunningTask function with the argument [1_000_000].

Limitations

  1. The worker function must be able to be stringified or cloned (e.g. cannot be a class method)
  2. All the libraries or packages the function uses in performing its task should be imported inside the function. This is becouse workers run in another global context that is different from the current window. The function will be isolated as if it were in a separate script.
  3. You can run whatever code you like inside the worker thread, with some exceptions. For example, you can’t directly manipulate the DOM from inside a worker, or use some default methods and properties of the window object

Here an example where we want to process a text file and return its content as a list of rows

import { pool } from "parallelizer-function";
import path from "path";

//////////// main script
try{
let pathToFile = path.resolve("../docs/sample-name.txt");

let res:Array<string> = await pool.exec(
async (pathFile)=>{
const fs = require('fs');
let files = fs.readFileSync(pathFile, { encoding: 'utf-8' });
return files.split('\n');
}, [pathToFile]);

console.log("Names: ", res)
}catch(error){
console.error(error)
}

Examples

Here are some examples of how you might use this package in your application:

  1. Running a long-running task, such as image processing, in a separate thread to prevent it from blocking the main thread.
  2. Running a large number of tasks concurrently, but limiting the number of threads to prevent the system from becoming overloaded.
  3. Running a task in a separate thread and using a callback function to retrieve the result

Let’s imagine we have this sample of functions we want to compute on a website or respond to a request using an express server.

import { Pool } from "parallelizer-function";

let pool = new Pool(4); // Here we instanciate another instance of Worker Pool and we set a pool of 4 workers

function isPrimeThisNumber(n){
// This function takes an integer and returns whether it is a prime number or not. Complexity O(n^1/2)
for(let i=2;i*i<=n;i++){
if(n%i == 0) return false;
}
return true
}

function TripleSum(arr=[]){
// This function return all the distinc triplet i,j,k i<j<k,
// where arr[i] + arr[j] + arr[k] sum up to 0. Complexity O(n^2)
let visited = new Set()
let sol = []
arr = arr.sort()
for(let i =0;i<arr.length;i++){
let target = -arr[i]
let isSeen = new Set()
for(let j=i+1;j<arr.length;j++){
if(isSeen.has(target - arr[j])){
let key = `${arr[i]},${arr[j]},${target - arr[j]}`
if(!visited.has(key))
sol.push([arr[i],arr[j],target - arr[j]])
visited.add(key)
}else{
isSeen.add(arr[j])
}

}
}
return sol
}

function simulateLongTask(delayS = 10){
// This function simulate a task that will take <delayS> seconds to finish
let now = Date.now();
let iter = 0;
let MAX_DELAY = delayS * 1000; // 10 seconds 100000 milliseconds
while((Date.now() - now) < MAX_DELAY ){
iter++;
}
return iter;
}

If you have a listener to react to the click button or an endpoint API and the inputs for the functions are bigger enough. The following snippets code will block the main thread of JS; which will cause a web site becomes unresponsible or an API that will not accept more incoming requests.

someBTNEl.addEventListener("click",()=>{
console.log(isPrimeThisNumber(352684978))
/// This will block the main thread
})
// Also this

someBTNEl.addEventListener("click",()=>{
let promiseFn = new Promise((resolve)=>{
resolve(isPrimeThisNumber(352684978));
})
promiseFn.then(console.log);
/// This also will block the main thread.
// Wrapping normal code in a Promise does not guarantee that the main thread will not be blocked.
})

Using pool we can avoid the bloking of the EventLoop for the above functions.

someBTNEl.addEventListener("click",async ()=>{
try{
let res = await pool.exec(isPrimeThisNumber,[352684978]);
console.log(res);
// This will not block the main thread of JS, it will run "isPrimeThisNumber"
// in a separate thread using Worker class.

}catch(error){

}

})
// You can do all computation in once
someBTNEl.addEventListener("click",async ()=>{
Promise.all([
pool.exec(isPrimeThisNumber,[352684978]),
pool.exec(simulateLongTask,[10])
]).then((isPrime,sum] )=>{
/// do more stuff
})
})

// Or in an endpoind for the computation of the functions
let functions = {
TripleSum,
isPrimeThisNumber,
simulateLongTask
}

app.post("/compute/:fn",(req,res)=>{
let fn = req.params.fn;
if(!fn || !(fn in functions)){
return res.status(401).json({msg:"Not found the function"});
}
try{
let res = await pool.exec(functions[fn],req.body?.args || []);
// This will not block the main thread of JS, it will run
return res.status(200).json({ error:false,msg:"OK",data:res });
}catch(e){
return res.status(400).json({ error:true,msg:e.message });
}
})

The importance of running heavy functions in a separate thread is that it prevents the main thread of the JavaScript application from being blocked, which can lead to poor user experience and slow response times. By using the pool object, you can ensure that your application remains responsive even when running heavy tasks, which can greatly improve the performance and user experience of your application.

Example of image processing

In the snipped code bellow, theprocessImage function takes in an image of type HTMLImageElement and creates a canvas element, draws the image on it, and then gets the ImageData of the canvas. It then runs the heavyImageProcessing function on the imageData which is a heavy operation that would block the main thread and then puts the processed imageData back on the canvas and returns the data URL of the canvas.

The processImageInWorker function is very similar to the processImage function, but instead of running the heavyImageProcessing function directly, it runs the function using the pool.exec method, passing in the heavyImageProcessing function and the imageData as arguments. This will run the heavyImageProcessing function in a separate thread using the pool object, and will not block the main thread. It then puts the processed imageData back on the canvas and returns the data URL of the canvas.

import { pool } from "parallelizer-function";

const heavyImageProcessing = (imageData: ImageData) => {
for (let i = 0; i < imageData.data.length; i += 4) {
// This is a heavy operation that will block the main thread
imageData.data[i] = imageData.data[i] * 2;
imageData.data[i + 1] = imageData.data[i + 1] * 2;
imageData.data[i + 2] = imageData.data[i + 2] * 2;
}
return imageData;
};

const processImage = async (image: HTMLImageElement) => {
const canvas = document.createElement('canvas');
canvas.width = image.width;
canvas.height = image.height;
const ctx = canvas.getContext('2d');
ctx.drawImage(image, 0, 0);
const imageData = ctx.getImageData(0, 0, canvas.width, canvas.height);

// This will block the main thread
const processedImageData = heavyImageProcessing(imageData);
ctx.putImageData(processedImageData, 0, 0);

return canvas.toDataURL();
};

const processImageInWorker = async (image: HTMLImageElement) => {
const canvas = document.createElement('canvas');
canvas.width = image.width;
canvas.height = image.height;
const ctx = canvas.getContext('2d');
ctx.drawImage(image, 0, 0);
const imageData = ctx.getImageData(0, 0, canvas.width, canvas.height);

// This will not block the main thread
const processedImageData = await pool.exec(heavyImageProcessing, [imageData]);
ctx.putImageData(processedImageData, 0, 0);

return canvas.toDataURL();
};

Here and example node Express application in StackBlitz

In this example in the route “/” of the aplication were are returning the result of computing a set of long tasks. If we don’t pass the query parameter usingPool=true the excecution will run in the main thread of NodeJs.

Fig 2: Hitting the endpoint “/” with the queryParameter “usingPool”=false

In Figure 2 above, when we reach the endpoint “/” the time it takes to finish the execution of mainHeavyTask is 2,347 s. But since these tasks are executed on the main thread all incoming requests are blocked waiting for their response until the main thread is released.

Fig 3: Hitting the endpoint “/” with the queryParameter “usingPool”=true

In Figure 3 above, when we reach the endpoint “/”, but now using the pool of 4 workers, the time it takes to finish the execution of mainHeavyTask is 1.5s (950 ms less than the previous one). But as these tasks are executed in different threads, all incoming requests will be executed normally, and their execution will not be affected by the execution of mainHeavyTask.

Here another example using react

Conclusion

The thread pool allows you to run your functions in separate threads, so they do not block the main thread. It also allows you to limit the number of concurrent threads, so that your application does not overload the system. This is useful for running long-running tasks, such as image processing, and running a large number of tasks concurrently.

Note

This package has been tested on Node.js v14.x v16.x, v18.x and in browser like Mozilla and Chrome using Vanilla JavaScript, React and Angular

Author

author: Jose Alejandro Concepcion Alvarez

repo: parallelizer-function

--

--

Jose Alejandro Concepción Alvarez

Jose A. Concepcion Alvarez is currently enrolled on the PhD of Computer Science. He has experience as a Full Stack Developer using JavaScript, Python and Java