Things that make you far better Node.js developer, part 4 (libUV)
In the first topic, I discussed one of the features that needed to add to the javascript to be a server-side language, was dealing with tasks that needed a lot of time to consume, and now It is time to talk about it. If you ask me, I would say this is one of the most critical aspects of Node.js. Personally, this is a topic that I love because I believe If you don’t deeply understand it, you will probably make some rookie mistakes that have some negative impact on performance.
Before LibUV, I think we need to know some fundamental topics in computer science that understanding them can help us to know how things work. By the end of this topic, hopefully, you will understand:
- Asynchronous vs synchronous
- Thread vs Process
- What is System call
- What are event loops and none blocking asynchronous execution?
- LibUV and event loop
- What is event loop lag?
- Does Node.js has a single thread?
Asynchronous vs synchronous
Synchronous command simply means reading the code line by line, and for each line, it should wait for the previous line to complete its execution. There are never two lines of code executing at the same time. In fact, this is the way that Javascript engines work, like v8. (check my first topic if you want to know more about JS engines). JS engines ready the codes line by line, and then it converts them into the machine code. In contrast, Asynchronous means we can process more than one thing at the same time. Next operation can occur while the previous operation is still getting processed, So code can execute in parallel. However, there are some commands in javascript that we can tell JS engines to run the code but not wait for the result and go to the next line. Then if all the given instruction is interpreted, get back to that task and show me its result.
let’s see some code:
There is a function named setTimeout
in js, which allows you to run a function later and not immediately.
setTimeout
has two parameters, the first parameter is the function that is going to be called (call back), and the second parameter is the time in milliseconds that is supposed to run the callback later. eg:
const fn = function () {
console.log('FN')
}
const interval = 2000;
setTimeout(fn, interval)
console.log('MAIN')
if you run this code, you will see MAIN
and then after two seconds (2000ms), FN
in console. Now if we change the interval to 0, what will we see? let’s do it
const fn = function () {
console.log('FN')
}
const interval = 0; // I changed interval to 0
setTimeout(fn, interval)
console.log('MAIN')
on the paper, we should see FN
and them MAIN
because the interval is zero, but if you run this code, you will see the same result as before. Although you put the interval 0, js engines don’t run setTimeout
callback immediately, instead, it needs to check whether there is any other line to interpret or not. In this example, the answer is yes,console.log(‘MAIN’)
so it logs MAIN
and then it logs FN
let’s spice it up a little bit. I want to add a long loop after console.log(‘MAIN’)
and see what will happen
const fn = function () {
console.log('FN')
}
const interval = 0;
setTimeout(fn, interval)
console.log('MAIN')
for(let i=0; i<99999999;i++) {}
let’s assume this for
loop takes 500 ms seconds to complete. if we run this code, you will see MAIN
, then JS engines need to count up to 99999999. when i
hits this number, now JS engine is free, and it can process the setTimeout
callback. This is really important to know. In fact, this interval doesn’t give the guarantee which the callback runs exactly at that time because js is a single thread, and it can not process two things at the same time.
Thread vs Process
when we run a program on our computer, we start something called process, and it is an instance of a running program. Each process should have at least one thread which is called Main thread, and it does the main job because program instruction (our code) is there and it is ready to be executed by CPU. Single process can have multi threads inside of it. each process has its own memory address space. One process can not corrupt the memory space of another process (there are some exceptions though) In contrast, All the threads in a single process use a common share memory of the process. That’s why misbehaving a thread might bring down the entire process. eg let’s assume we have two threads in a process, and thread one is deleting something in memory that thread two needs. This might bring down the entire process. We also need to manage race conditions and locking, which sometimes might make you crazy (Multi-threading is evil :)
Chrome allocates one process for each tab, so if one tab has any problem, it doesn’t affect other tabs. in fact, when we run Chrome, OS creates a process named parent process, and when we open the new tab, the parent process creates a new child process. also children process can create another new process. These processes can exchange and share data with each other through something called interprocess communication (IPC).
Each process is represented in OS by a Process Control Block (PCB) that shows some properties of the process. Let’s take a look at some of them:
- Process ID : each process has the unique id
- Process state: represents the different states of the process, which shows its current activity like WAITING (eg, when it needs for some events to occur such as IO) , RUNNING, READY (it means it is ready to be processed by CPU. OS does its best to minimize the ready processes and allocate CPU as soon as possible) or …
- Program counter: store the address of the next line of the instruction that has to be executed
- Memory management information
- I/O status information: shows the devices that are assigned to that process
- …..
When we close one tab of chrome (process), the process creates a exit
system call (don’t worry if you don’t know what system call is, I will discuss it in this article) and ask OS to delete it. This process may return an integer number to its parent process with wait
system call. So the parent process based on that integer number knows whether the child process is terminated with the error or not. At this time, all resources of the terminated process will be deallocated by OS.
If you go to the Activity monitor on mac or Task manager on windows, you can see the number of process and thread that is currently running on your Machine.
Since we have limited resources on our computer, OS through something called OS Scheduler (Which is a really advanced topic) tries to allocate CPU an equal amount of time to all of the processes and their threads.
Each thread has a priority that indicates whether it is urgent to run or not. For example, the thread that is responsible for managing the mouse cursor should get CPU immediately; otherwise user might feel some lag there. So OS scheduler is switching between all the processes and threads all the time, and they call it Context Switch.
Context Switch is expensive every time that CPU needs to work on another process, OS needs to store the state of the current process (in PCB that we saw above), so the process can be restored and resume execution at a later point. Switching between threads also required context switching, but it is less expensive than changing process because there are fewer steps to track, and more importantly, since threads use the share memory space, there is no need to switch between memory pages.
What is system call?
In OS, we have two kinds of space. Kernel space and User space. (kernel is the center of UNIX operating system. It is a process that interfaces between user and hardware. by kernel, we can ask something from OS to do)
If a process is executing in User mode, that program doesn’t have direct access to the memory and hardware resources, and in contrast, if a process is executing in Kernel mode, it has direct access to the computer resources.
The reason that we have these two spaces is if the process is executing in a Kernel mode and, for some reason, it crashes, then entire system would crash. But if the program in the user space crashes, just that process might be affected. When we write the program, our codes are loaded into the User space. Now let’s talk about system call.
By the help of system call, the programs that exist in the User space can request a service from the kernel. Once the kernel receives the request, it runs the related system program for that request.
There are some services that can be requested from that kernel:
- File manipulation (create, delete, open and get file attribute)
- Process control (create a process and allocate the memory or stop the process and free the memory, get or set process attribute, eg process needs to wait for some amount of time to execute again or it needs to wait for the specific events or it needs to send the event)
- Device management (request or release a device)
- Information maintenance (get the time and date and other system data that you might need)
- Communications ( By communications I mean I communications between processes or different devices that might be in the same computer or different computer. it creates or deletes a communication connection and also sends or receive messages using their type of system calls. eg sending message to screen process, )
Asycnrhnous none blocking vs synchronous blocking
Let’s assume we want to create a single-thread web server based on HTTP. when a client wants to make a request to our server through HTTP protocol, the server needs to establish a TCP connection and when it does, server allocates some amount of memory for that TCP connection which is called TCP socket. at this time server can accept the request from that particular client. let’s assume clients created a GET
request for Index.html
file. When CPU want s to run this code, it sends the request to the disc controller and it says hey please read this file for me and then it needs to wait for the result. in fact the thread is blocked now and next lines in the thread instruction can not run and they have to wait. so we are blocked here. if at this time, another client wants to make a request to the webserver, since the thread is blocked, it can not serve that request. So there are many strategies that can solve this problem. for example, apache creates a new thread for every request (of course it can not create an infinite amount of threads and it somehow manages that), but the down sides is they need to deal with multi-threading issues like racing threads with each other to access to process memory or context switching or …
Now let’s talk about another possibility:
LibUV
I think most of the Node.js beauty comes from this amazing library guys. It is written in C language, and we can also use it stand-alone like v8.
This library gives us the ability to write synchronous code in javascript that is very easy to manage and still responds to things that are happening asynchronously, like any IO operation. So as a developer, we don’t need to deal with multi-threading anymore; however, if we don’t understand it, we might abuse Node.js. Now let’s talk about the way that this library work.
let’s assume that we want to run this node.js program
console.log('Hi from the main');
const callback = () => {
console.log('Hi From call back');
}
setTimeout(callback, 1000)
console.log('Hi Log after the setTimeout');
When we run a node.js program, OS creates a process that has one thread. this program goes to v8 and then v8 pass codes to the LIBUV. LibUV check the first line of code which is console.log(‘Hi from the main’);
LibUV knows that console.log
is a synchronous command, so it passes this line to v8 to run.
The next line is just assigning a variable. let's go the next line which is the setTimeout
command. When libUV see this command, it knows that it should not pass the callback
function to V8 now because it has to wait for 1000ms
so what LibUV does is it puts thecallback
into a queue, and then it goes to check the next line, which is console.log(‘Hi Log after the setTimeout’);
Since this line is also another synchronous command, LibUV passes it to the V8 to execute. This line was the last line, now libUV know that there is an item in the queue and it should wait.
Once again LibUV checks whether 1000ms
is passed or not. if not, it pauses for some amount of time, and then it checks again. If the time is passed now, libUV passes the callback to v8 to execute.
This constant checking is called event loop, and it is managed by the main thread. How can I prove this? let’s put a long for
loop at the end of our node.js program:
console.log('Hi from the main');
const callback = () => {
console.log('Hi From call back');
}
setTimeout(callback, 1000)
console.log('Hi Log after the setTimeout');
for (var i = 0; i<= 99999999 ;i++) {} // this line
When LibUv wants to run this loop, let’s assume CPU needs 2000ms
to execute it. So at this time, Main thread is busy with this loop. The thing is we had a setTimeout
command that was supposed to run 1000ms
. but unfortunately, since our application is single thread, main thread is blocked by this command and it can not run the callback of the setTimeout
. So in this scenario, the callback will be run after 2000ms
instead of 1000ms
. as you can see, we have a latency here which is called Event loop lag (it is one of the most important performance metric). So as a developer we need to pay attention to not write codes that blocks the main thead!
But what if I want to run a CPU-expensive task like creating a hash? Now LibUV shines. let's take an example:
There is a built-in method in Node.js that can create a hash for us.
const crypto = require('crypto');
const now = new Date()
const makeHash = (index) => {
const callback = (err, derivedKey) => {
console.log(`${index} Hash is ready after ${new Date() - now} ms`);
}
crypto.pbkdf2('secret', 'salt', 100000, 64, 'sha512', callback);
}
makeHash(1);
makeHash(2);
makeHash(3);
makeHash(4);
makeHash(5);
In the above example, I imported the crypto
module from node.js, and then I got the current time. Then I created a function that is going to make a hash and assigned a callback to it. So when it calculates the hash, it calls the callback, and in the callback, I put a simple log that shows how long this hashing process takes, and in the last line, I run the makeHash
function five times. If you run this code depending on the speed of your computer, you should see something like this:
In this log we have three interesting data. Let’s analyze it:
As you can see, the first log comes from makeHash(2)
.
2 call of Hash function get back the result ready after 54 ms
But Based on what I called, I assumed that I should see the result in order of what I called. like I should see the result of makeHash(1)
before makeHash(2)
Another interesting data in here is it seems makeHash(1)
makeHash(2)
makeHash(3)
makeHash(4)
returned the response in the same amount of time around 56ms
.
But how is it possible? If we assume calculating the hash function takes 56ms
from the CPU, when we call makeHash(1)
main thread should be busy for 56ms
and when makeHash(1)
is done, then LibUV should go the makeHash(2)
so this function should return the data after 2 * 56ms
and the makeHash(3)
should return the data after 3* 56ms
By analyzing these two facts, it seems these codes are ran in parallel.
It’s true, guys. Node.js is not single thread :)
LibUV has a thread pool that has four threads inside of it that are called worker threads. (we can change the number of threads in the thread pool).
When event loops want to run an expensive tasks (dns.lookup()
, All file system APIs except fs.FSWatcher().
crypto
, zlip
) , it assigns that task to a worker thread. For example, when we called the crypto
, LibUV knows that pbkdf2
function is a time-consuming task, so instead of running this function on a main thread, it assigns a worker thread from the thread pool to handle that function, and then event loop continues working. So the main thread is not blocked.. once a worker thread is done with, it puts the result into a queue in even loop. . now event loop passes this function to V8 to execute, which is running in the main thread.
It is essential that since we have a limited number of worker thread, we need to make sure that we don’t block them for a long time with too many expensive tasks that needs a lot of CPU calculation. Let me give you another example of what I mean
In the above example, I created a big file named bigFile.txt
and I want to read that file. again, I put the log to measure the time that takes to get the result
const fs = require('fs')
var now = new Date()
fs.readFile('./bigFile.txt', (err, data) => {
console.log(`Reading the file get back the result ready after ${new Date() - now} ms`);
})
When we run this code, libUV see the fs.readFile
line, it knows this is an IO task, and it needs to assign a worker thread to it. Behind the sence, worker thread wakes up and just creates a system call to open and read the file, and then worker thread goes back to the thread pool. So basically, what worker thread does, in this case, is just delegate the task to the OS by calling the system call. Once OS is done with that task, it notifies the event loop. Now event loop can pass the related callback to the v8 to execute.
In the above example, reading the first chunk data of this file on my computer took around 20ms
. But what if we blocked the thread pool? let’s do it:
const fs = require('fs')
const crypto = require('crypto')
var now = new Date()
const makeHash = (index) => {
const callback = (err, derivedKey) => {
console.log(`${index} call of Hash function get back the result ready after ${new Date() - now} ms`);
}
crypto.pbkdf2('secret', 'salt', 100000, 64, 'sha512', callback);
}
makeHash(1);
makeHash(2);
makeHash(3);
makeHash(4);
fs.readFile('./bigFile.txt', (err, data) => {
console.log(`Reading the file get back the result ready after ${new Date() - now} ms`);
})
In this example, first, I ran four hash functions that make all the worker threads busy for something like 54ms
When event loop wants to run the fs.readFile
, at this time, there is no available worker thread to allocate. So LibUV must wait to get a thread from the thread pool. Once one thread is finished, event loop, allocates that thread to the fs.readFile
. here is the log that proves what I said:
As you can see now, reading the file is done around 20ms after releasing one thread from the thread pool.
There are some other topics that are left here, and I will cover them soon, like different phases of the event loop, IOCP, or what we can do to take advantage of all the power of the CPU in node.js applications or…
That’s it guys, Hope you enjoyed this topic. If you enjoyed it, please do me a favor and like or put a comment so that others can see my posts. This is a big motivation for me to continue.
So stay tuned. Love you❤️