From Source Code to Executable Code: How JavaScript Engine Works

Arayik Yervandyan
SFL Newsroom
Published in
5 min readFeb 4, 2020

It may be surprising, depending on your level of interaction with various languages, but in fact, JavaScript falls under the category of compiled languages. It is not compiled well in advance, as are many traditionally-compiled languages, nor are the results of compilation portable among various distributed systems.

The JavaScript engine performs many of the same steps, albeit in more sophisticated ways than we may commonly be aware, of any traditional language-compiler. Roughly these processes we can call “compilation.”

Let’s dive deeper into 3 steps that JavaScript engine performs before the code execution process starts. The three steps are Tokenizing, Parsing and Code-Generation which we will discuss in this article.

1. Tokenizing/Lexing

The compilation process starts with Tokenizing. The Tokenizer ( or Lexical analyzer) takes the source code from language preprocessors that are written in the form of sentences and breaking up a string of characters into meaningful (to the language) chunks, called tokens.

For instance, consider the program:

Fig. 1. Variable assignment in Javascript

In JavaScript, keywords, constants, identifiers, strings, numbers, operators and symbols can be considered as tokens. So this program would likely be broken up into the following tokens: var, a, =, 2, and ;.

Whitespace may or may not be persisted as a token, depending on whether it’s meaningful or not.

Tokenizing works closely with the Syntax analyzer and the flow of this phase goes like this (Please check Fig 2)

1. The Lexical analyzer reads character streams from the source code and generates the tokens.

2. The Syntax analyzer checks the validity of tokens, if it finds a token invalid, it generates an error.

3. If the syntax is correct, the flow will be returned to Lexical Analyzer and the flow will be repeated until the Tokenizing will be finished.

Fig 2. Lexical Analyzer and Syntax Analyzer flow in the compilation process

For instance, let’s take a look at the following code (Fig. 3).

Fig. 3. Syntax Analyzer will throw an error and the compilation will not be continued

Here the compiler throws an error because of an invalid token which was passed to the syntax analyzer.

Note: The difference between tokenizing and lexing is subtle and academic, but it centers on whether or not these tokens are identified in a stateless or stateful way. Put simply, if the tokenizer were to invoke stateful parsing rules to figure out whether it should be considered a distinct token or just part of another token, that would be lexing.

2. Parsing

Parsing is taking a stream (array) of tokens and turning it into a tree of nested elements, which collectively represent the grammatical structure of the program. This tree is called an “AST” (Abstract Syntax Tree) (Fig 4).

Fig. 4.AST diagram for var a = 2 statement

The tree (Fig. 4) for ” var a = 2;” might start with a top-level node called VariableDeclaration, with child nodes called Identifier (with the value of “a”), and AssignmentExpression which itself has a nested child called NumericLiteral (with the value of “2”).

Now we have a small overview of how AST was constructed. Let’s dive deeper into AST for a further understanding of the parsing phase of our JavaScript compiler. In order to transform the JavaScript code into AST, we can use the AST Explorer tool.

AST representation of our previously considered code(“var a = 2;”) will have the following structure(Fig. 5).

Fig. 5. AST representation for var a = 2 statement
  • The tree is composed of a series of nodes each with a type property. The tree node below strips all except for the type properties from the tree node above.
  • Notice that each node contains location data that refers to the position of the associated expression in the source code.
  • The key node in tree is VariableDeclaration which represents the kind of variable — in this case, a var

3. Code-Generation

Code generation can be considered as the final step of the compilation process. This is a process of taking AST and turning it into executable code. Executable code generated by the compiler is a low-level programming language code (in the case of JS — machine code).

We have seen that the source code written in JavaScript is transformed into a machine code that results in a low-level object code, this part varies greatly depending on the language, the platform it’s targeting but they all should have the following minimum criteria:

  • It should carry the exact meaning of the source code.
  • It should be efficient in terms of CPU usage and memory management.

In short, we can say that here’s a way to take our above described AST for var a = 2; and turn it into a set of machine instructions to actually create a variable called “a” (including reserving memory, etc.), and then store a value into an “a.”

For more details of compilation process you can check “Compiler Basics” by James Alan Farrell

Conclusion

These were the three steps of the JavaScript compilation. As JavaScript is being executed mostly on the client-side, it doesn’t get the luxury of having plenty of time for compilation and optimization while in other languages such as Java the compilation happens in a build step — ahead of time.

For JavaScript, the compilation occurs in mere milliseconds before the code is executed and to ensure the fastest performance, the JavaScript engine uses all kinds of tricks like JITs, which is a lazy compilation and even a hot re-compile(ion) etc. For more simplicity let’s say that any snippet of JavaScript has to be compiled before it’s being executed.

Like most other language compilers, the JavaScript compiler is much more complex than just the three steps described above. For instance, during the process of parsing and code-generation, there is an additional step to optimize the performance of the execution, about which I will write more in my next article. So stay tuned.

Resources

--

--