Asim Zaidi
May 18 · 6 min read

Compiler Theory

Despite the common misconception that JavaScript is a “dynamic” or “interpreted” language, it is in fact a compiled language. One of the reasons for this misconception could be because JavaScript is not compiled well in advance, as you would see with many traditionally-compiled languages. But at the end of the day, the JavaScript engine performs many of the same steps as these other languages, perhaps even in more sophisticated ways. (Side note: The engine is responsible for start-to-finish compilation and execution of our JavaScript program.)

In traditional compile-language processes, a chunk of source code, your program, will undergo typically three steps *before* it is executed, roughly called “compilation.”

~ Eric Elliot

These two phases, lexical analysis and syntax analysis, put together are known as the analysis phase, or the front-end of the compiler. This is an intermediate code representation that the compiler uses to represent a source text.

The steps involved (in this order) are as follows:

Step 1: Lexical Analysis Phase

The is a joint effort between the scanner and the tokenizer.

Part A: Scanning The Text

The very first thing the complier does is is scan the text. This job is performed by the scanner. The text breaks up into its smallest possible parts, known as lexemes.

Part B: Tokenizing/Lexing

The second thing is creating a stream of tokens to be parsed for code generation. So basically, this breaks up a string of characters into meaningful (to the language) chunks, called tokens. This line of code, var a= 2; going through the the tokenization phase would break down and looks likevar, a , =, 2, ;.

Step 2: Syntax Analysis Phase

So I’m going to explain 2 things here, but they may differ depending on the language. Once the lexical analysis is complete, we begin the syntax analysis phase. Depending on the language at hand, some languages may develop a Parse Tree, Concrete Syntax Tree (CST) and then use that to derive their Abstract Syntax Tree (AST). However, some languages like JavaScript will go straight to developing an AST without any CST. Let’s walk through the CST first and then the AST to help you understand.

Here you can think of a CST in plain English terms.

In the case of the English language, the smallest “part” of every sentence is a word; words can be combined into phrases, like noun phrases or verb phrases, which can, in turn, be joined with other phrases to create a sentence expression.

~ Vaidehi

So parsing an English sentence into a CST seems easy enough. Let’s look at how it would apply to something like math.

Cool, right? Based on the rules of math, we create a CST accordingly! But wait, depending on how we traverse that tree, it could potentially be interpreted the wrong way. Let’s look at a few different ways one might traverse it.

The key thing to note here is that there is ambiguity, and ambiguity is problematic for a compiler. So in return, the compiler lookers for clarity, directions to follow. In this example, one direction we might give our compiler is to simply traverse left to right. Later, we might add some parenthesis, exponents, multiplication, division, addition, and subtraction (PEMDAS) directions, and so forth rules.

CST rules will rely on the language at hand, but they will always break tokens down into a tree of its smallest form of meaningful nodes.

So what the heck is an AST and why does JavaScript jump straight to creating this instead?

Before we dive into it, you need to know this: The parser will always generate an AST as its output, whether or not it creates a parse tree in between.

Got it? Cool, so here’s a quick example of a node in an AST:

Our simple expression of 5 + (1 x 12) from before could be constructed into a visualized illustration of an AST, like this:

Notice how we lost the parentheses there? We can see that the AST doesn’t care to insert all the tokens into nodes, but rather it reads through each token and determines its importance. This is dependent on the rules of the language we discussed. So the end result of an AST’s may seem much easier to read. But don’t get it twisted. Even though an abstract syntax tree may be more compact and care about a lot less than than a CST would in terms of its outputted nodes, it is still far more complex to derive. The AST has to determine the importance of a token, and this gets complicated for the compiler 😬.

Classic JavaScript, fandangling things together 🤣

Building a CST is actually pretty easy once the parser knows the grammar of the language it’s trying to parse. It doesn’t need to do any complicated work to figure out if a token is “significant” or not. Instead, it just takes exactly what it sees in the specific order that it sees it, and spits it all out into a tree.

Step 3: Code-Generation

The final step is code generation. This is the process of taking an AST and turning it into executable code. This part varies greatly depending on the language, the platform it’s targeting, etc. It’s also not as important for us to know as developers, so we won’t be diving into detail on this portion. Just know that there is a mechanism in place which takes the AST for say, `var a = 2;` and turns it into a set of machine instructions to actually create a variable called a (including reserving memory, etc.), and then stores a value into a. How the engine manages system resources is deeper than we need to dig for now, so we’ll just take it for granted that the engine is able to create and store variables as needed 🙂.

Final Note

The JavaScript engine is vastly more complex than just those three steps, as are most other language compilers. For instance, in the process of parsing and code-generation, there are certainly steps to optimize the performance of the execution, including collapsing redundant elements, etc. But knowing these 3 steps are the most important for now. Once you dive deeper into the data flow and scopes of a language like JavaScript, you will understand why it’s important to know what the compiler is doing for you as a developer.

Thank you to to everyone who contributed to writing the 6 books on You Don’t Know JavaScript and to Vaidehi Joshi, whose images I borrowed for this.

Better Programming

Advice for programmers.

Asim Zaidi

Written by

Founder @Bondfire and @TechMade // Manager Sad Boy KJ. Gymnast, Spoken Word Rapper, Hackathon Hacker, University of Illinois Dropout. Currently, in the Bay.

Better Programming

Advice for programmers.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade