Things that make you far better Node.js developer (part 1)

Hossein Derakhshan
11 min readDec 18, 2022

--

In this series of articles, I want to talk about how Node.js works behind the scenes. We will see some C++ parts of Node.js codes. I also talk about some basic concepts which is not just related to Node.js but a general concept that even many senior node.js developers might don’t know.

By the end of this article, you will know:

  • What is Javascript engine, Why do we need that, How it works, and what is the difference between the very beginning Javascript engines and the current generation of JS engines? (At high level)
  • How we can add a javascript code that runs a c++ function that might do a low-level operation?
  • What does Node.js do exactly? we will see some C++ part of Node.js code
  • and many more …

Low-level and high-level language

Let’s talk about microprocessors. We can simply say a microprocessor is like a small machine that is used in any equipment requires processing like traffic lite, remote control, or computers. The CPU is a microprocessor. The microprocessor is an integrated circuit that is made up of millions of transistors. However, not all microprocessors are CPUs. There are NPUs, GPUs, and APUs that remove network, graphics or audio processing from the CPU.

Microprocessors have three parts: ALU, register, and control unit. We don’t need to dive into them for our purpose. Based on how these parts are designed, they have a different language to speak. E.g., some of them speak in ARM, x86–64, MIPS, IA-32.

We give the microprocessor instruction to run our program, and those instructions (or simply codes) must be in the language the microprocessor speaks. This language is called Machine Code or Machine language. In fact, all programs must be converted to machine code so that microprocessors can understand them. Yeah, we could write the program with the machine code, but you know, when you take a look at the syntax, you might find out why we don’t write the code with machine language :)

Here is an example of machine code

As you can see here, this is a very, very low-level language. as time has gone, more high-level languages emerged, like assembly, which is still very close to the machine language, but the syntax is more readable. Then we saw C and C++, which still are used heavily, and their syntax is a bit like Javascript or java. In fact, Java and Javascript were inspired by C and C++ syntax. However, C and C++ still give you control over low-level things like memory management or …

Javascript engine

Now let’s move on to Javascript and Node.js. as you might understand, microprocessors can not understand Javascript language. So we need something to convert JS to the microprocessor language.

They call it Javascript engine. In fact, javascript engine is a program that converts js into machine language (However, now, js engines can also do more with JIT which we will talk), and they are developed by web browser vendors. These programs (like Internet Explorer, Chrome, and firefox) have their own JS engine. Most of these engines are written in C and C++. They all essentially do the same thing at a high level, but they do differ in their approach.

In order to avoid any confusion between these engines, they created a standard named ECMAScript. So every JS engine must implement these standards. Each year, Js contributors add some new features to JS, and they put a name for that standard like ecmascript 2015 specification or es5..

These engines try to support new JS features based on the instruction in the ECMAScript every year.

One of these js engines is created by google since they wanted to use it in Google chrome. They call it V8. it is open source, meaning you can easily clone the v8 project and see or even change some codes. It is one of the fasted javascript engines. One of the good things about v8 is it can be used stand-alone in c++ applications.

There are a lot of JS engines there, Firefox uses SpiderMonkey, which was the very first js engine created, and it is created by the javascript creator itself. It was used in Netscape browser, and later on, it became open-source and is currently maintained by Mozilla.

Chakra was used in Microsoft edge before they moved to chromium itself. one of the core features of Chakra was it could compile a script on a separate CPU core parallel to the web browser.

Safari uses something called JavascriptCore.

This is the source of V8 code, and as you can see, it supports different types of microprocessors:

More on Javascript engine and JIT (Good to know. you can skip this part)

Now, these engines use a technology named Just-in-time compilation to convert JS into machine code. (when js was invented, js engine was mostly responsible for interpreting and executing the code and not optimization)

This is the definition of the JIT in Wikipedia:

just-in-time (JIT) compilation is a way of executing computer code that involves compilation during execution of a program (at run time) rather than before execution.

We all heard that compiled languages compile code into the machine code all at once (they call it Ahead of Time Compilation or AOT), but in interpreted languages, codes are interpreted line by line But I think the most important difference here is compiler never executes code at all.

You might ask Hey Hossein, so what is the difference between Interpretation and JIT?

Interpretation executes the code line by line. Some Javascript engines like Hermes, (the engine which React Native uses), doesn’t use a JIT compiler.

The most interesting part of JIT is Once the code starts running, it can optimize it. (And of course the downside is JIT must warm up when an application starts ) Once it compiles the code, it will keep an eye on that code, and if it is used again, it tries to optimize the code. Let’s dive into it a little more:

As I said, every javascript engine has its own implementation, but in V8, source code goes into the module name Tokenizer. (Lexical analysis or lexer or scaner are another name for Tokenizer). The tokenizer is responsible for dividing the input stream into individual tokens (in other word, Tokenizer is another representation of the source code in understandable format).

Then those tokens goes to the Parser (or syntactic analysis). Parser is responsible for checking the syntax. If it finds error, it sends the error, and if the code is valid, it generates something called AST (Abstract syntax tree), which is a kind of tree of the nodes that represent the code. You can think of it like JS DOM. There is a site named Astexplorer that will show you what the actual AST looks like. For example, this is the AST representation for this code:

var x = 100;

We should also say that this happens at Static time or Compile time. This means by this moment; we haven’t actually started executing our program. All we did was just translate it to several intermediate representations. There is no actual execution yet. Then this AST goes to another module named ByteCode Emitter and it creates the next intermediate representation known as bytecode. We can think of it as an abstract of machine code

Now this Bytecode, goes to another part named Interpreter (this happens at run time) and then Interpreter creates the MachineCode

The reason that the ByteCode Emitter does not create a machine code directly is that machine codes depend on the architect of the Microprocessor, as I talked before, but bytecode is universal;

Another reason is we can do a lot of optimization on the bytecodes. This is where JIT does a lot, guys.

Imagine the situation when we have some heavy-weight function that does some complex calculation, and this function is called multiple times during the program execution. Eventually, calling this function might become a bottleneck. What JIT really do here is it gets the MachineCode from that part of the code. Getting MachineCode means now it knows e.g, where all variables related to that code are located in the memory and many more … So when you call this function again, if the parameters is the same, it uses that MachineCode.

The module that do this optimization in V8 is called Turbofan. (in Firefox engines that call it Ionmonkey)

Let’s add a feature to JS by V8

As I said, we can use v8 stand-alone, Meaning we can import V8 in our c++ program where others can write javascript, and our C++ program can take that javascript and run it through V8.
This means you can essentially add features to javascript by embedding V8 into your C++ program.

Let’s just take a look at this c++ program which is in the v8 source code.

As you can see, this is the c++ program that imports v8.

Also, in this file, you can see some functions like Print, Read, Load, Quit, and …

These are functions that is written in this tiny C++ program. Let’s scroll down more to see the following section.

In this section, we are binding c++ functions to some keywords that v8 understands. we are saying hey v8 if you see print keyword in the javascript syntax, run my c++ print method because we know that in JS itself, we don’t have a print keyword, right?

This is a simple example. You can change this file and add your own function. For example, you can add a C++ function to turn off your computer or any other low-level operations and bind it to the JS. Then in javascript, you can run that function! it is interesting no? :)

Finally, What Node.js is!

Node.js is like this program. In fact, Node.js is a c++ program that by the help of V8 can accepts javascript and adds some features (functions) on top of Javascript (like working with files on the hard drive or connecting to the database or…) because js itself can not deal with these low-level operations, but c++ can. we know that JS was designed to just run in the browser and in contrast, node.js is designed to manage a server instead. For example, we can create a web server by Node.js.

You probably know what I mean by server, but if you don’t know, these terms (client, server) come from web development and describes where application runs. When we say server-side code, it means there is a service like an API that do something and runs in a machine (computer) that is accessible to other machines through any kind of network like Internet, Lan or …
Another term for these words might be backend/front-end.

We need to know that there is some unofficial Node.js version like (https://github.com/nodejs/node-chakracore) that works with other javascript engines. This might be useful when we want to run Node.js on IoT devices, for example. Because generally speaking, these devices have weaker CPU and also less memory space than a server. So they can replace V8 with another javascript engine that is not as heavy as V8 (executable size), like QuickJS. V8 executable size is around 28 MB, and quick JS is around 620K, which is so lighter than V8. You might ask why not just using QuickJS all the time? because in most cases, 28MB size of V8 is not a big deal. Instead, we are getting amazing performance by using v8.

here are the benchmark of QuickJS vs other JS engins:

Heart of Node.js

In order to have the web server, we need to add some features in JS to handle the kinds of things that a web server needs to do like:

1. Organize codes better (remember when node.js was created, JS didn’t support the modules)

2. Communicate through network (Internet, Lan or …) like accepting request and send the response.

3. Work with files (open, read, write).

4. Work with Databases.

5- Deal with tasks that might take time to handle.

Node.js resolved all of these problems. In the following articles, I will tell you the way that Node.js handles them under the hood. We will talk about fundamental concepts like libuv, event loop, buffer, stream (we will see the different types of streams and the reason behind that), events, event emitter, and many more topics. Even we will see some c++ code for these parts.

But before that, let’s see some codes in Node.js Repo together (you don’t have to clone this. I already did for you):

Deps:
In this folder, we see the C++ dependencies of node.js. Meaning the modules or anything that Node.js needs to run. These are one of the most important dependencies:

Deps > V8 :
I hope now you know what v8 is :)

Deps > uv :

I will talk about later is uv which stands for LibUV (one of the most important part of Node.js).

Deps > llhttp:
this module is responsible for parsing HTTP requests.

src:
In this folder, you will see a lot of C++ files. If you open the node.h file, you will see v8 there.

lib:
In this folder, you see the javascript side of Node.js . There are a lot of javascript files here that help make using those c++ features easier, as well as some other utilities that we might need a lot in javascript. Most of these files are a wrapper for c++ features that we saw in the src folder; However you might see some pure javascript code there

let’s take a look at zlip.js file for example:

as you can see in line 65, there is a function named internalBinding

This function returns an internal module and you can bind it with your JS code.
In the previous node.js versions, they used process.binding, but since v19 this method is deprecated, and you should use internalBinding function.

That’s it for now. In the next post, I will talk about the way that Node.js handles modules under the hood, and we will see some codes related to that.

Stay tuned, and Love you.

--

--

Hossein Derakhshan

Hey, I'm Hossein. I'm a javascript developer who wants to know what is going on behind the scenes of these tools :)