CODEX

Making Your Own JavaScript Linter (part 2)

A comprehensive tutorial

Joana Borges Late
CodeX

--

A linter running

This is the second part of a comprehensive tutorial on constructing a JavaScript linter. You can read the first part here.

And here is the source code of dirtyrat in GitHub.

In the first part of the tutorial, we learned about the tokenizer, the scanner, and the displaying of errors. Just one file per each of these three entities is enough. And the code inside each file is easy to follow and is not very big. Piece of cake!

Now we have to examine the relations among the token objects that represent the source code we want to lint. The part of the linter responsible for this task is called a parser. The parser must do a lot more than just checking each token against its next neighbor. We just got to the point where things get hot!

There are so much code and diverse functionality that we have to split the parser into a bunch of modules. Among them the parser.js itself. We will look at all of them.

Statements and expressions

Theoretically, the source code has two kinds of components: statements and expressions. An expression is something that is data (value) or can be evaluated to data. A statement is an element of the structure of the code and generally is bound to a keyword (function, return, if, etc.).

This definition statement x expression is somewhat arbitrary and confused and may change from one programming language to another. I believe in the following concept:

The computer is a machine made to follow instructions. Thus, any source code is a set of statements (explicit or implicit), that may contain expressions that may contain statements that may contain expressions. And so on, recursively.

This notion of statements and expressions (including operators) is fundamental for writing the parser.

The module Parser

Basically, the module parser handles first-level statements: “use strict”, imports and declaration of global variables and functions. For global variable initialization, it relies on the module expression. In the case of functions, it passes almost the whole job to the module function.

This parser doesn’t accept common code outside functions. It also calls the module register, which is specialized in controlling names.

Parsing the function head

As we saw above, parsing the few kinds of first-level statements is easy. Parsing a function is somewhat complicated because we need to be able to handle functions inside expressions inside loops inside if clauses inside functions inside whatever…

In JavaScript, there are three kinds of functions that begin with the keyword function: global (first level) functions, inner functions, and anonymous functions. As they differ only at the start (inner or not, anonymous or not), the parser (module function) can use the same code for all of them after parsing the start.

Parsing the function body — the router

We must understand the function body as just a list of statements. Even when there is no keyword (assignment or function call). Even when all that is written in a line is a closing curly brace. Imagine the symbol saying “Current block/scope ends here.”.

Therefore, our task is writing a loop that will route each first token of each line to the proper function to handle it.

Condensed source code like the example above tells us that the principle only-one-statement-per-line (which was fine for parsing first-level statements) is not going to work here.

The router:

We will talk more about the router.

Parsing the function body — the handlers (workers)

These three functions above are enough to help explain some concepts. I call them handlers or workers because they effectively work, process the tokens, eat them; they handle the job that is delivered by the router (manager).

These three functions start eating the first token available, which contains the respective keyword.

eatElse checks if the context allows the else statement to be at that position (more on this subject soon).

eatIf and eatReturn don’t need to check the context because any possible context (inside the body of the function) is valid for them, considering that some errors are caught somewhere else. For example:

The invalid token return is caught as error at the function eatIf. And the linter exits before call eatReturn. Thus, eatReturn never has to worry about the token that comes BEFORE the token return.

As each statement has its own rules, the best place to put the checking on the next token is inside the “eat statement” function.

After parsing (eating and analyzing) its tokens, eatReturn checks the next token, without eating (when there is no error) because the next token must be available for the router; which is the function in charge of parsing end-of-block and end-of-line tokens.

After parsing its tokens, eatIf and eatElse notify the parser that a new block and a new branch were created (more on this subject soon).

As we can see the pattern is independent parsing of all statements. There is only one exception to this rule: the statement catch. It is the only statement that is not supposed to be caught by the router.

Now, what makes catch different from break or else to the point that it must have a special parsing? Easy answer. You can have if without having else. You can have for without having break or continue. But if you have try you must have catch. finally is optional, so it is treated like any other statement.

Each time dirtyrat closes a try block, it calls forceEatCatch. This is the reason why catch is not supposed to be caught by the router: it is caught by forceEatCatch.

To be continued

You can read the third part of the tutorial here.

--

--