How does NodeJS work?

Eugene Obrezkov
Eugene Obrezkov
Published in
13 min readAug 23, 2015

Hi everyone! My name is Eugene Obrezkov and today I want to talk about one of the “scariest” platforms — NodeJS. I’m going to answer one of the most complicated questions about NodeJS — “How does NodeJS work?”.

I’m going to present this article as if NodeJS didn’t exist at all. This way, it should be easier for you to understand what’s going on under the hood.

Code found in this post is taken from existing NodeJS sources, so after reading this article, you should be more comfortable with NodeJS.

What do we need this for?

The first question that may come to your mind — “What do we need this for?”.

Here, I’d like to quote Vyacheslav Egorov: “The more people stop seeing JS VM as a mysterious black box that converts JavaScript source into some zeros-and-ones the better”. The same idea applies to NodeJS: “The more people stop seeing NodeJS as a mysterious black box that runs JavaScript with low-level API the better”.

Just Do It!

Let’s go back to 2009, when NodeJS started its way.

We’d like to run JavaScript on backend and get access to low-level API. We also want to run our JavaScript from CLI and REPL. Basically, we want JavaScript to do everything!

How would we do this? The first thing that comes to my mind is…

Browser

Browser can execute JavaScript. So we can take a browser, integrate it into our application and that’s it.

Not really! Here are the questions that need to be answered.

Does browser expose low-level API to JavaScript? — No!

Does it allow to run JavaScript from somewhere else? — Both yes and no, it’s complicated!

Do we need all the DOM stuff that browser gives us? — No! It’s overhead.

Do we need browser at all? — No!

We don’t need that. JavaScript is executed without browser.

If browser is not a requirement for executing JavaScript, what does execute JavaScript then?

Virtual Machine (VM)

Virtual Machine executes JavaScript!

VM provides a high-level abstraction — that of a high-level programming language (compared to the low-level ISA abstraction of the system).

VM is designed to execute a single computer program by providing an abstracted and platform-independent program execution environment.

There are lots of virtual machines that can execute JavaScript including V8 from Google, Chakra from Microsoft, SpiderMonkey from Mozilla, JavaScriptCore from Apple and more. Choose wisely, because it may be a decision you may regret for the rest of your life :)

I suggest that we choose Google’s V8, why? Because it’s faster than other VMs. I think you’ll agree that execution speed is important for backend.

Let’s take a look at V8 and how it can help to build NodeJS.

V8 VM

V8 can be integrated in any C++ project. Just take V8 sources and include them as a simple library. You are now able to use V8 API that allows you to compile and run JavaScript code.

V8 can expose C++ to JavaScript. It’s very important as we want to make low-level API available within JavaScript.

Those 2 points are enough to imagine rough implementation of our idea — “How we can run JavaScript with access to low-level API”.

Let’s draw a line here about all this stuff above, because in the next chapter we will start with C++ code. You can take Virtual Machine, in our case V8 -> integrate it in our C++ project -> expose C++ to JavaScript with V8 help.

But how can we write C++ code and make it available within JavaScript?

V8 Templates

Via V8 Templates!

A template is a blueprint for JavaScript functions and objects. You can use a template to wrap C++ functions and data structures within JavaScript objects.

For example, Google Chrome uses templates to wrap C++ DOM nodes as JavaScript objects and to install functions in the global scope.

You can create a set of templates and then use them. Accordingly you have as many templates as you want.

And V8 has two types of templates: Function Templates and Object Templates.

Function Template is the blueprint for a single function. You create a JavaScript instance of template by calling the template’s GetFunction method from within the context in which you wish to instantiate the JavaScript function. You can also associate a C++ callback with a function template which is called when the JavaScript function instance is invoked.

Object Template is used to configure objects created with function template as their constructor. You can associate two types of C++ callbacks with object templates: accessor callback and interceptor callback. Accessor callback is invoked when a specific object property is accessed by a script. Interceptor callback is invoked when any object property is accessed by a script. In a nutshell, you can wrap C++ objects\structures within JavaScript objects.

Take a look at this simple example. All this does is expose C++ method LogCallback into global JavaScript context.

V8 Function\Object Templates

At line #2 we are creating new ObjectTemplate. Then at line #3 we are creating new FunctionTemplate and associate C++ method LogCallback with it. Then we are setting this FunctionTemplate instance to ObjectTemplate instance. At line #9 we are just passing our ObjectTemplate instance to new JavaScript context, so that when you run JavaScript in this context, you’ll be able to call method log from global scope. As a result, C++ method, associated with our FunctionTemplate instance, LogCallback, will be triggered.

As you see, it’s similar to defining objects in JavaScript, only in C++.

By now, we learned how to expose C++ methods\structures to JavaScript. We will now learn how to run JavaScript code in those modified contexts. It’s simple. Just compile and run principle.

V8 Compile && Run JavaScript

If you want to run your JavaScript in created context, you can make just 2 simple API calls to V8 — Compile and Run.

Let’s take a look at this example, where we are creating new Context and running JavaScript inside.

V8 Compile && Run JavaScript

At line #2 we are creating JavaScript context (we can modify it with templates described above). At line #5 we are making this context active for compiling and running JavaScript code. At line #8 we are creating new string from JavaScript source. It can be hardcoded, read from file or any other way. At line #11 we are compiling our JavaScript source. At line #14 we are running it and expecting results. That’s all.

Finally, we can create simple NodeJS, combining all the techniques described above :)

C++ -> V8 Templates -> Run JavaScript -> ?

You can create VM instance (also known as Isolate in V8) -> create as much FunctionTemplate instances, with assigned C++ callbacks, as you want -> create ObjectTemplate instance and assign all created FunctionTemplate instances to it -> create JavaScript context with global object as our ObjectTemplate instance -> run JavaScript in this context and voila -> NodeJS. Sweet!

But what is the question mark after “Run JavaScript” in chapter’s title? There is a little problem with implementation above. We missed one very important thing.

Imagine, that you wrote a lot of C++ methods (around 10k SLOC) which can work with fs, http, crypto, etc… We have assigned them [C++ callbacks] to FunctionTemplate instances and import them [FunctionTemplate] in ObjectTemplate. After getting JavaScript instance of this ObjectTemplate we have access to all of the FunctionTemplate instances from JavaScript via global scope. Looks like everything works great, but…

What if we don’t need fs right now? What if we don’t need crypto features at all? What about not getting modules from global scope, but requiring them on demand? What about not writing C++ code in one big file with all the C++ callbacks in there? So question mark means…

Modularity!

All those C++ methods should be split in modules and located in different files (it simplifies the development) so that each C++ module corresponds to each fs, http or any other feature. The same logic is in JavaScript context. All the JavaScript modules must not be accessible from global scope, but accessible on demand.

Based on these best practices we need to implement our own module loader. That module loader should handle loading C++ modules and JavaScript modules so that we can grab C++ module on demand from C++ code and the same for JavaScript context — grab JavaScript module on demand from JavaScript code.

Let’s start with C++ Module Loader first.

C++ Module Loader

There will be a lot of C++ code here, so try not to lose your mind :)

Let’s start with basics of all module loaders. Each module loader must have a variable that contains all modules (or information on how to get them). Let’s declare C++ structure to store information about C++ modules and name it node_module.

node_module structure in NodeJS

We can store information about existing modules in this structure. As a result we have a simple dictionary of all available C++ modules.

I’m not going to explain all the fields from the structure above, but I want you to pay attention to one of them. In nm_filename we can store filename of our module, so we know where to load it from. In nm_register_func and nm_context_register_func we can store functions that we need to call when module is required. These functions will be responsible for instantiating Template instance. And nm_modname can store module name (not filename).

Next, we need to implement helper methods that work with this structure. We can write a simple method that can save information to our node_module structure and then use this method in our module definitions. Let’s call it node_module_register.

NodeJS Native Module register method

As you can see, all we are doing here is just saving new information about module into our structure node_module.

Now we can simplify registering process using a macro. Let’s declare a macro that you can use in your C++ module. This macro is just a wrapper for node_module_register method.

NodeJS Native Module register macros

First macro is a wrapper for node_module_register method. The other one is just a wrapper for previous macro with some predefined arguments. As a result we have a macro that accepts two arguments: modname and regfunc. When it’s called, we are saving new module information in our node_module structure. What do modname and regfunc mean? Well… modname is just our module name, like fs, for instance. regfunc is a module method that we talked about earlier. This method should be responsible for V8 Template initialization and assigning it to ObjectTemplate.

As you can see, each C++ module can be declared within a macro that accepts module name (modname) and initialization function (regfunc) that will be called when module is required. All we need to do is just create C++ methods that can read that information from node_module structure and call regfunc method.

Let’s write a simple method that will search for module in node_module structure by its name. We’ll call it get_builtin_module.

Lookup for registered native module

This will return previously declared module if name matches the nm_modname from node_module structure.

Based on information from node_module structure, we can write a simple method that will load the C++ module and assign V8 Template instance to our ObjectTemplate. As a result, this ObjectTemplate will be sent as a JavaScript instance to JavaScript context.

Load bindings and send it to JavaScript context

A few notes regarding the code above. Binding takes module name as an argument. This argument is a module name that was given by you via macro. We are looking for information about this module via get_builtin_module method. If we find it, we call initialization function from this module, sending some useful arguments like exports. exports is an ObjectTemplate instance, so we can use V8 Template API on it. After all these operations, we get the exports object that we get as a result from Binding method. As you remember, ObjectTemplate instance can return JavaScript instance and that’s what Binding does.

The last thing we should do is make this method available from JavaScript context. This is done at the last line by wrapping Binding method in FunctionTemplate and assigning it to global variable process.

At this stage, you are able to call process.binding(‘fs’) for instance, and get native bindings for it.

Here is an example of a built-in module with omitted logic for simplicity.

NodeJS V8 Native Module example

The code above will create a binding with a name “v8” that exports JavaScript object, so that calling process.binding(‘v8’) from JavaScript context gets this object.

Hopefully you are still following along.

Now we should make JavaScript Module Loader that will help us do all the neat stuff like require(‘fs’).

JavaScript Module Loader

Great, thanks to our latest improvements, we can call process.binding() and get access to C++ bindings from JavaScript context. But this still does not resolve the issue with JavaScript modules. How can we write JavaScript modules and require them on demand?

First of all, we need to understand that there are two different types of modules. One of them is JavaScript modules that we write alongside with C++ callbacks. In a nutshell, these are NodeJS built-in modules, like fs, http, etc… Let’s call these modules NativeModule. The other type are modules in your working directory. Let’s call them just Module.

We need to be able to require both types. That means we need to know how to grab NativeModule from NodeJS and Module from your working directory.

Let’s start with NativeModule first.

All JavaScript native modules are located within our C++ project in another folder. That means that all of JavaScript sources are accessible at compile-time. This allows us to wrap JavaScript sources into a C++ header file, that we can use in the future.

There’s a Python tool called js2c.py for this (located under tools/js2c.py). It generates node_natives.h header file with wrapped JavaScript code. node_natives.h can be included in any C++ code to get JavaScript sources within C++.

Now that we can use JavaScript sources in C++ context — let’s try it out. We can implement a simple method DefineJavaScript that gets JavaScript sources from node_natives.h and assigns them to ObjectTemplate instance.

NodeJS DefineJavaScript for Native Module

In the code above, we are iterating through each native JavaScript module and setting them into ObjectTemplate instance with module name as a key and module itself as a value. The last thing we need to do is call DefineJavaScript with ObjectTemplate instance as target.

Binding method comes in handy here. If you look at our Binding C++ implementation (C++ Module Loader section), you’ll see that we hardcoded two bindings: constants and natives. Thus, if binding’s name is natives then DefineJavaScript method is called with environment and exports objects. As a result, JavaScript native modules will be returned when calling process.binding(‘natives’).

So, that’s cool. But another improvement can be made here by defining GYP task in node.gyp file and calling js2c.py tool from it. This will make it so that when NodeJS is compiling, JavaScript sources will also be wrapped into node_natives.h header file.

By now, we have JavaScript sources of our native modules available as process.binding(‘natives’). Let’s write simple JavaScript wrapper for NativeModule now.

NodeJS JavaScript Native Module

Now, to load a module, you call NativeModule.require() method with module name that you want to load. This will first check if module already exists in cache, if so — gets it from cache, otherwise the module is compiled, cached and returned as exports object.

Let’s take a closer look at cache and compile methods now.

All cache does is just setting NativeModule instance to static object _cache located in NativeModule.

More interesting is the compile method. First, we are getting sources of required module from _source (we set this static property with process.binding(‘natives’)). We are then wrapping them in a function with wrap method. As you can see, resulting function accepts exports, require, module, __filename and __dirname arguments. Afterwards, we call this function with required arguments. As a result, our JavaScript module is wrapped in scope that has exports as pointer to NativeModule.exports, require as pointer to NativeModule.require, module as pointer to NativeModule instance itself and __filename as a string with current file name. Now you know where all the stuff like module and require is coming from in your JavaScript code. They are just pointers to NativeModule instance :)

Another thing is Module loader implementation.

Module loader implementation is basically the same as with NativeModule, the difference is that sources are not taken from node_natives.h header file, but from files that we can read with fs native module. So we are doing all the same stuff as wrap, cache and compile, only with sources read from file.

Great, now we know how to require native modules or modules from your working directory.

Finally, we can write a simple JavaScript module that will run each time NodeJS is run and prepare the NodeJS environment using all of the stuff above.

NodeJS Runtime Library?

What is a runtime library? It’s a library that prepares the environment, setting global variables process, console, Buffer, etc, and runs the main script that you send to NodeJS CLI as an argument. It can be achieved with a simple JavaScript file that will be executing at NodeJS runtime before all other JavaScript code.

We can start with proxying all our native modules to global scope and setting up other global variables. It’s just a lot of JavaScript code that does something like global.Buffer = NativeModule.require(‘buffer’) or global.process = process.

Second step is running the main script which you send in NodeJS CLI as an argument. Logic is simple here as well. It just parses process.argv[1] and creates Module instance with its value as a constructor value. So, Module is able to read sources from file -> cache and compile it as NativeModule does with precompiled JavaScript sources.

There’s not much I can add here, it’s really very simple, if you want more details though, you can take a look at src/node.js file in node repository. This file is executing at NodeJS runtime and uses all the techniques, described in this article.

This is how NodeJS is able to run your JavaScript code with access to low-level API. Cool, isn’t it?

But all of the above can’t do any asynchronous stuff yet. All the operations like fs.readFile() are fully synchronous at this point.

What do we need for asynchronous operations? An event loop…

Event Loop

Event loop is message dispatcher that waits for and dispatches events or messages in a program. It works by making a request to some internal or external event provider (which generally blocks the request until an event has arrived), and then it calls the relevant event handler (dispatches the event). The event loop may be used in conjunction with a reactor if the event provider follows the file interface which can be selected or polled. The event loop almost always operates asynchronously with the message originator.

V8 can accept event loop as an argument when you are creating V8 Environment. But before setting up an event loop to V8 we need to implement it first…

Finally, we already have that implementation which is called libuv. It’s responsible for all the asynchronous operations like read file and others. Without libuv NodeJS is just a synchronous JavaScript\C++ execution.

So, basically, we can include libuv sources into NodeJS and create V8 Environment with libuv default event loop in there. Here is an implementation of it.

NodeJS Create Environment

CreateEnvironment method accepts libuv event loop as a loop argument. We are able to call Environment::New from V8 namespace and send there libuv event loop, and then configure it in V8 Environment. That’s how NodeJS became asynchronous.

I’d like to talk about libuv more and tell you how it works, but that’s another story for another time :)

Thanks!

Thanks to everyone who has read this post to the end. I hope you enjoyed it and learned something new. If you found any issues or something, feel free to comment and I’ll reply as soon as possible.

Eugene Obrezkov aka ghaiklor, Technical Leader at Onix-Systems, Kirovohrad, Ukraine.

--

--

Eugene Obrezkov
Eugene Obrezkov

Software Engineer · elastic.io · JavaScript · DevOps · Developer Tools · SDKs · Compilers · Operating Systems