Javascript — How the Engine Compiles?

Osman Akar
9 min readFeb 10, 2019

--

Photo by Oskar Yildiz on Unsplash

Javascript is considered an Interpreted Language, but its engine acts like compiling the code. We will discuss why Javascript is considered a compiled language anymore and explain all the steps at a basic level.

It is always good to know or at least have a basic understanding of what goes behind the scenes while your code is compiled. In terms of Javascript, we will have a basic understanding of how the Javascript engine will perform while compiling your code with this article.

What is Scope?

One of the common things of all programming languages is that they store values in variables, and later retrieve or modify those values. Without this concept, writing a program would be limited.

Here we will analyze that where do those variables live? It means, where are they stored? Also, how does our program find them when it needs to?

The scope is the place that contains all the rules for storing all of the variables and finding those variables at a later time.

Compiler Theory

You might know this or not, but despite the fact that JavaScript is considered as a dynamically typed, interpreted language, it is in fact a compiled language. It is not compiled well in advance, as are many traditionally compiled languages, but in the end, it is a compiled language. It compiles code before it executes it. Despite the fact that the compiling of Javascript works in a different way, if compared to other compiled languages, It’s still following some rules that reflect the process of compiling. We will discuss these compile processes.

What is Interpreted language?

To have a better understanding of this, you should know ‘What static and dynamic typing are?’, also ‘What static and dynamic binding is?’. On the other hand, these topics include terms like run time and compile-time, so we should know, ‘What compile time is?’ and ‘What run time is?’ also.

Static indicates that resolution takes place at the time a program is constructed — compile time. Dynamic indicates that resolution takes place at the time a program is run — run time.

Run time vs Compile time:

Run time and compile time are programming terms that refer to different stages of software program development. After a developer writes a source code, this code must be compiled into machine code in order to become an executable program. This is referred to as compile time.

A compiled program can be opened and run by a user. When the application is running, it is called run time.

What static and dynamic typing is?

Static typing means that the executable form of a program generated at compile time will vary depending upon the types of data values found in the program. Dynamic typing means that the generated code will always be the same irrespective of the type — any differences in execution will be determined at run time.

In dynamic typing, value can change from its first declarations type to another type at any time for example int to string, but in statically typing value cannot be changed from one type to another easily. You have to convert them.

What static and dynamic binding is?

Binding refers to the association of names in program text to the storage locations to which they refer. In static binding, this association is predetermined at compile time. With dynamic binding, this association is not determined until run time.

Example:

If someone attempts to invoke a method like MyClass.foo(), a static binding system will verify at build time that there is a class called MyClass and that class has a method called foo. A dynamic binding system will wait until run time to see whether either exists.

Contrasts:

The main strength of static strategies is that the program translator is much more aware of the programmer’s intent. This makes it easier to:

  • Catch many common errors early, during the build phase,
  • Build refactoring tools,

The main strength of dynamic strategies is that they are much easier to implement, meaning that:

  • A working dynamic environment can be created at a fraction of the cost of a static one,
  • It is easier to add language features that might be very challenging to check statically,
  • It is easier to handle situations that require self-modifying code — is a piece of software that achieves its goal by rewriting itself as it goes along for example for AI applications.

Finally, I can answer the question-What is Interpreted language?. With a compiled language, the code you enter is reduced to a set of machine-specific instructions before being saved as an executable file — at compile time. With an Interpreted language, most of its implementations execute instructions directly and freely, without previously compiling a program into machine language instructions. Interpreted languages must be reduced to machine instructions.— at run time. You can write dynamically typed interpreted language. Also, there are some scenarios that you can write statically typed interpreted language like CINT.

In the traditional compiled-language process, the program will undergo typically, three steps before it is executed, roughly called ‘compilation’:

  • Tokenizing/Lexing,
  • Parsing,
  • Code-Generation.

Tokenizing/Lexing:

It means that, breaking up a string of characters into meaningful chunks (to language), called tokens. For example, consider this line of code, var foo = 5; This program would be broken up into the following tokens: var, a, =, 2,;. White spaces might be ignored if they have no meaning. The difference between tokenizing and lexing is, Lexer is basically a tokenizer, but it usually attaches extra context to the tokens — this token is a number, that token is a string literal, etc.

Parsing:

Taking a stream (array) of tokens and turning it into a tree of nested elements, which collectively represent the grammatical structure of the program. This tree is called an ‘AST’ (Abstract Syntax Tree). For example, var foo = 5; might start with a top-level node called VariableDeclaration, with a child of node called Identifier (whose value is foo), and another child called AssigmentExpression which itself has a child called NumericLiteral (whose value is 5).

Code-Generation:

This process taking an AST and turning it into executable code. This part varies greatly depending on the language, the platform it is targeting, etc.

Let’s consider the AST that we have mentioned in the Parsing section above. Code-Generation will turn this into a set of machine instructions to actually create a variable called foo (including memory, etc.), and then store a value into foo. So it means, the engine is able to create and store variables as needed.

PS: Javascript engine does not get the luxury (like other language compilers) of having plenty of time to optimize, because Javascript compilation does not happen in a build step ahead of time, as with other languages.

Also, I have to mention that, any snippet of Javascript has to be compiled before (usually right before!) it is executed. So, the Javascript compiler will take the program var foo = 5; and compile it first, and then be ready to execute it, usually right away.

The Cast

  • Engine: responsible for start-to-finish compilation and execution of our Javascript program.
  • Compiler: helper of the Engine and handles all the dirty work of parsing and code-generation.
  • Scope: helper of the Engine, collects and maintains a look-up list of all declared identifiers (variables), and enforces a strict set of rules as to how these are accessible to currently executing code.

When you write var foo = 5; it seems like one statement but it is not to the Javascript engine. Engine sees two statements here, one which Compiler will handle during compilation, and one which Engine will handle during execution.

Let’s understand that how the Engine and helpers of it will act like when they see the var foo = 5;

The first thing Compiler will do is, lexing to break it down into tokens, then it will parse into tree.

When Compiler gets to the code-generation part, it will act like this:

  1. Encountering var foo, Compiler asks Scope to see if variable foo already exists for that particular scope collection. If so Compiler ignores this declaration and moves on. Otherwise, Compiler asks Scope to declare a new variable called foo for that scope collection, it is not written in strict mode.
  2. The compiler then produces code for the Engine to later execute, to handle the foo = 5 assignment. The code Engine will first ask Scope if there is a variable called foo accessible in the current scope collection. If so, Engine uses that variable. If not, the Engine looks elsewhere (will be covered later).

If the Engine eventually finds a variable, it assigns the value 5 to it. If not, it will give an error.

Conversation between Compiler and Helpers

When the Engine executes the code that Compiler produces, it has to look up the variable that it is executing. That’s why there are two kinds of type for Engine to look up a variable:

  1. LHS: left-hand side.
  2. RHS: right-hand side.

Differences between LHS and RHS:

When we say, console.log( foo ); the reference to foo is an RHS reference, because nothing is being assigned to foo here. Instead, we are looking up to retrieve the value of foo, so that the value can be passed to console.log.

When we say, foo = 5; the reference to foo here is an LHS reference, because we do not actually care what the current value is, we simply want to find the variable as a target for the = 5 assignment operation.

While looking up for an LHS or RHS, there is no need for an equality operator. For example, consider this code:

function foo ( a ) {    console.log( a ); // 5}foo( 5 );

Here, foo( 5 ) requires an RHS reference to foo, meaning that ‘go look-up the value of foo, and give it to me.

Also, you need to pay attention to that, there is no a = 5; here but behind the scenes, while passing 5 as an argument to foo, it will assign 5 to an automatic. From the look-up perspective, this is also an LHS.

Let’s generate a conversation between Engine and Scope, we will analyze the snippet code above;

  1. Engine: Hey Scope, I have an RHS reference for foo. Ever heard of it?
  2. Scope: Yes, I have. The compiler declared it just a second ago. It is a function. Here you go.
  3. Engine: Great, thanks! I’m executing foo.
  4. Engine: Hey Scope, I have got an LHS reference for a, ever heard of it?
  5. Scope: Yes. I have. Compiler declared it as a formal parameter to foo just recently. Here you go.
  6. Engine: Thanks. Now time to assign 2 to a.
  7. Engine: Hey Scope, I need an RHS look-up for the console. Ever heard of it?
  8. Scope: Yes, I have got a console. It is built-in. Here you go.
  9. Engine: Looking up the log(…). Ok great, it is a function.
  10. Engine: Hey Scope. Can you help me with the RHS reference to a? I think I remember it, but just want to double-check.
  11. Scope: You are right, Engine. The same guy has not changed. Here you go.
  12. Engine: Thanks. Passing the value of a, which is 2 into a log(…).

Nested Scope

There is usually more than one Scope to consider. Scopes are nested inside other scopes. So, if a variable cannot be found in the immediate scope, the Engine consults the next outer scope, continuing until found or until to global scope.

The conversation between Engine and Scope is the same as mentioned above section when there are nested scopes also. After asking scope about variable, if it is not found Engine asks to the outer scope and it keeps doing this until the global scope. If a variable is not there also, it creates the variable. If strict mode is enabled, it does not create a variable, instead of it gives an error — ReferenceError.

What is Strict Mode?

It is added in ES5, has a number of different behaviors from normal/relaxed/lazy mode. One such behavior is that it disallows the automatic/implicit global variable creation. If this mode is enabled and if Engine cannot find a variable for both cases — LHS and RHS Engine will throw an error — ReferenceError. If a variable is found but you try to do something impossible with its value, for example, you tried to execute the variable as a function, but it is a number, then TypeError will be called by Engine.

ReferenceError is Scope resolution-failure related, whereas TypeError implies that Scope resolution was successful, but that there was an illegal/impossible action attempted against the result.

--

--