Deep diving NodeJs Clustering: Part1

Understanding Port Sharing and Load Balancing

Published in

NodeStack

9 min readSep 16, 2024

Yes, I know what you’re thinking: ‘Yet another post on Node.js clustering? But stick around! While there are countless posts on the topic, this one takes a unique approach by breaking down the magic of port sharing, and showing you how to implement basic clustering from scratch using the `child_process` module in the second part of this post.

Node.js is popular for building scalable applications due to its non-blocking, event-driven architecture. However, Node.js runs on a single thread by default, which can limit its ability to utilize multi-core processors fully. To address this limitation, Node.js supports clustering, allowing multiple processes to handle incoming connections on the same port. In this blog and the following post, we’ll explore two clustering approaches using the `child_process` module, explain how port sharing works, and provide practical code examples.

Ok, What is Clustering in Node.js?

Clustering in Node.js is a technique that allows you to spawn multiple instances (workers) of your application, each running on a separate CPU core. These workers share the same server port and can handle incoming connections independently. This setup improves scalability and performance by fully utilizing all CPU cores on a server.

Clustering Visualisation Under Round Robin Scheduling

It’s important to understand a few basic but essential concepts to understand how clustering works.

Inter-Process Communication(IPC)
Listening Socket (Listening FD)
Connection Socket(Connection FD)
File descriptor(Linux-based system) or handles(Windows OS)

Inter-Process Communication(IPC)

IPC is a mechanism that allows processes to communicate with each other and synchronize their actions via message passing. In the context of Node.js, when using clustering with IPC, the Main process (Master Process) can communicate with its worker processes to share data, send messages, or coordinate tasks.

IPC in Node.js Clustering

Master Process:

The Master Process is responsible for spawning worker processes.
The cluster module internally uses the child_process module to fork worker processes.
child_process module also creates an IPC channel so that the primary and child processes can communicate.
It can send and receive messages from worker processes using IPC channels.
The Master Process can also monitor and manage worker processes, restarting them if they fail.

Worker Processes:

Each worker process is a child process spawned by the Master Process.
Workers can handle incoming client requests, perform computations, or manage resources.
Workers can communicate with the Master Process (and potentially other workers) using IPC.

Listening Socket

Purpose: A listening socket is used by a server to listen for incoming connection requests from clients. It does not handle data transfer; instead, its primary role is waiting for and accepting new connections.

Behavior: When a server application starts, it creates a socket, binds it to a specific IP address and port, and then listens on that port for incoming connections. The socket in this state is called a listening socket. In Nodejs clustering mode the Primary process creates a listening socket.

Lifecycle: The listening socket remains open as long as the server is running and willing to accept new connections. It does not participate in the communication beyond the initial connection setup.

Connection Socket (Accepted Socket):

Purpose: A connection socket transfers actual data between the client and the server. Once a connection is established, the connection socket handles all subsequent communication. I

Behavior: When a listening socket receives an incoming connection request and accepts it, the server(Primary or child) creates a new socket specifically for this connection. This new socket is the connection socket (also known as the accepted socket), and it is used for the entire duration of the communication between the client and the server. In Nodje clustering mode either Primary or child creates a connection socket( Based on the Scheduling policy, more detail in the following) when a listener socket receives a new connection request.

Lifecycle: The connection socket is created when a connection is accepted and remains open as long as the communication continues. Once the communication is complete, the connection socket is closed.

FD or Handles:

A file descriptor or handle is essentially an integer assigned by the operating system to represent an open file, socket, or other I/O resource within a process. When a process opens a socket or file, the operating system returns a file descriptor, which the process can use to read from, write to, or close the socket or resource. Each file descriptor corresponds to an entry in the process’s file descriptor table, which maintains metadata about the open resource, whether it is a socket or a file.

The Clustering Overview

Primary Process: This is the main process that manages the worker processes.

Worker Processes: These are the separate processes created by the primary process to handle incoming connections.

The Primary process can keep the listening socket to itself (Round Robin) or share it with the worker processes (Direct Connection Handling) when a connection request comes in, depending on the approach used.

Cluster Module structure

The cluster module has the following components:

Cluster.js

Cluster: Primary.js
Cluster: Child.js

Worker.js: Common Worker implementation shared between the cluster primary and workers.

Utils.js: Utility functions to handle and intercept internal communication between primary and child processes. Internal communication refers to the IPC used between the Primary and child for setting up a cluster.

Round Robile Handle: Responsible for creating FD, listening and accepting incoming connections, and delegating the work(client Handler) to workers in a robin fashion.

Shared Handler: If the Scheduling policy is set to other than SCHED_RR then the primary will use this to create a shared Server handler and passed to each worker. then each worker listens to incoming connections.

Two Approaches to Distribute Connections

Round-Robin Approach (Default on most platforms):

In the round-robin approach, the master process accepts the connection and then hands off the connection socket to one of the worker processes. The worker process then handles all communication through these connection sockets

Example: Imagine you have a ticket counter with three workers. Every new customer is directed to the next available worker in turn. This helps ensure that no single worker is overwhelmed.

Direct Connection Handling: In this approach, the master process creates the server socket but doesn’t handle connections directly. Instead, it allows each worker process to independently accept and handle incoming connections.

Direct Connection Handling -Shared FD (OS will make sure one of the workers process will get connection)

When a new connection is made to the shared port, the operating system decides which worker process will handle the connection. This could be based on availability or other criteria, ensuring that connections are distributed across multiple processes.

Example: Instead of sending customers, each worker has a ticket counter. This could theoretically speed things up, but in practice, it might lead to uneven distribution of customers (e.g., some workers get a lot more customers than others).

The second approach should, in theory, give the best performance. In practice however, distribution tends to be very unbalanced due to operating system scheduler vagaries. Loads have been observed where over 70% of all connections ended up in just two processes, out of a total of eight.
Source: https://nodejs.org/api/cluster.html#how-it-works

Port Sharing Explained

Now it’s time to break down the details of how port sharing works in Nodejs

Let’s discuss a basic example

When starting a server with clustering, it first checks if the current process is the primary one using cluster.isPrimary. If it's the primary process, it creates worker processes cluster.fork() based on the number of available CPU cores. cluster.fork internally use child_process.fork method call. child_proces.fork is responsible for creating a worker(child) process along with an IPC channel for communication between a primary(Parent) and worker(child) process.

When the worker process is executed the else part of the code is executed

Now I have two questions for you to reason about :

If you noticed that Primary has created no listener socket but I mentioned above that the primary process creates a listener socket. How?
Each worker process seems to create its server listening at port 8080. How?

Hang on for the moment and think. You should ask yourself what happens when server.listen()is called in each worker process. How can each worker process listen to the same port?

Alright, enough chit-chat. Do you want answers? here they are —

One thing we know is that two processes can not run on the same port. Unless something is shared? Well that something is called file descriptor (FD) or handle.

The following things happen when server.listen called in the worker process

Is, basically a call to the net.Server.prototype.listen. The net module has internal APIs for clustering and it checks if the worker or Primary calls listen by calling the method listenInCluster :

When server.listen is invoked in worker(child process) then it further requests for Server handle by calling cluster._getServer function

Note: cluster._getServer is defined in child.js which basically is the cluster.js in the worker. check the cluster.js for more details.

cluster._getServer (child.js._getServer) will further communicate internally to Primary using IPC to get the handler details. Worker use the send method defined in Utils.js to send messages to Primary. The message structure to get handle details from the server looks like this:

const message = { act: 'queryServer',index, data: null,...options};

NOTE: For simplicity and brevity I have not discussed the initial setup and communication between primary and workers it’s also beyond the scope of this blog.

When the primary receives a message from the worker, the queryServer method is executed

In the round-robin scheduling, a Robin handle (Server handle) is created by Primary and set itself for listening for the incoming connection.

For more details, you can check the code in the distribute and handoff methods defined in robin_handle.js. Also don’t forget to look at the add method defined in there. On connection primary send the following message to the worker to delegate the task (pass the client handle)

const message = { act: 'newconn', key: this.key };

sendHelper(worker.process, message, handle, (reply) => {
   ...
  });

If the Scheduling policy is set to SCHED_NONE, then the Primary only creates a Server handle and sends it to the workers.

Upon getting a reply back from Primary now let’s see what the worker (refer child.js) does. Have a look at the code snippet for cluster._getServer.

For the Shared Server handle it sets up things like `handle. close’ and execute callback name “cb” The callback function cb here is a private function ‘‘listenOnPrimaryHandle“ from the net module.

listenOnPrimary internally calls setupListenHandle and as the name suggests it configures the handle for listening for the connections. A few lines of code in the function setupListenHandle to look at would be :

// this here represet net.Server object
  this._handle.onconnection = onconnection;
  this._handle[owner_symbol] = this;

For Round Robin scheduling, it does a bit more, creates a fack handle object with methods listen, ref, unref, etc then executes the callback function from the net module with fack handle as one of the parameters.

Check onconnection(message, handle) in child.js and callback cb in net module will help you to understand it better.

Handle before calling callback ‘listenOnPrimaryHandle’

After the call to ‘listenOnPrimaryHandle, “the handle is ready for handling connection distributed by Primary. See the below screenshot.

So basically upon receiving handle details from Primary, a worker configures the server object by setting up the handle as you can see in the above screenshots.
Once the handle is set it’s ready for serving connection. For an even better understanding, you can set the breakpoints and debug the whole flow.

Summary

Clustering in Node.js is a powerful technique that enables you to make full use of multi-core processors, improving the scalability and performance of your application. In this blog post, we explored two approaches to clustering using the `cluster` module:

1. Master Process Distributing Connections (Round-Robin): This approach uses a round-robin algorithm to evenly distribute connections across workers, providing efficient load balancing.

2. Workers Directly Accepting Connections: Workers independently accept and handle connections, with the operating system managing the distribution.

In part 2, I will create basic clustering using the child_process module which will further help you to understand the clustering and port sharing.