CodeX
Published in

CodeX

CODEX

Making Your Own JavaScript Linter (part 3)

A comprehensive tutorial

A linter running

This is the third part of a comprehensive tutorial on constructing a JavaScript linter. You can read the second part here.

And here is the source code of dirtyrat in GitHub.

Parsing the function body — blocks

Statements like if, else, for, while always create a block of code, also known as scope. When the parser finds, for example, the statement break it must have a way to know if this statement is inside a loop or not. Other example: when the parser finds a closing curly brace, it must know what block is being closed.

Controlling the code blocks is done by two global variables and two small functions.

As shown in the code above it is very easy to know when a statement break is valid. It is just the case of ask if any open block is for or while.

Now, the check for the statement else must be different:

We validate the else statement by checking whether lastClosedBlock == “if” . The nature of else is not come inside the if block. Its nature is come AFTER the if block.

Why we need a JavaScript linter?

Browsers, Deno (JavaScript) and NodeJs don’t run a code that has any syntax error. But they don’t care about the error of calling a function that doesn’t exist, till the moment when the execution tries to call this function. Maybe the runtimes think “That name is not defined, but let’s give time to it be defined; after all, we are running a dynamic language!”.

Borrowing the terms from compiled programming languages, we could say that in JavaScript a syntax error is a compile time error. And all others are run time errors.

My personal convention: the expression “runtime” (without space) means the engine, like the Chrome V8. And the expression “run time” (with space) means the moment the code is running.

Compile time errors are honest. You immediately know all them. And they are easy to fix. No one ships software with compile time errors. Hum… right now I am thinking of a possible exception to this rule.

Run time errors are insidious. They are also known as bugs. Sometimes they happen 30 minutes after the code started running. Sometimes they happen on a very rare combination of mouse movement, button pressing and something else. They are the programmer’s hell.

JavaScript is too much permissive. The language permits, for example, creating any number of functions with the same name, in the same scope. The latest function is the only that will run and all the other homonym functions exist only to mislead you. Or, much worse, maybe there is some case when some of them run for some time (I don’t know if it is possible).

I am not against overriding a function as long as it is really necessary and done in a very clear way.

Does anyone write an assignment inside the condition of an if statement?

The two main reasons to use a JavaScript linter are 1) turning run time errors into compile time errors and 2) catching as errors things that JavaScript considers OK.

Controlling names

About names, a good linter must

  1. ensure there are no duplicated declarations,
  2. ensure there are no use of undefined names and
  3. warn about unused names (unused local name often is a typo and a bug).

Note: dirtyrat knows that it may be linting a code under construction. Therefore, instead of pointing errors on unused local variables, it produces strong warnings.

While parsing token by token, each time a name is declared or consumed, the linter registers the respective token. And later the linter tries to match consumed names (assignments, function calls) with declared names.

The matching must be done later because it is normal for a function to be called before being declared. Also, for imports and exports it is necessary to register all names of all files before start matching.

The golden rule for names is: different things must have different names.

We cannot rely only on a simple name as identifier because different elements may have the same name as long as they are in different scopes.

We must include the scope in the identifiers on the code above: “x”, “f1”, “f1.x”, “f2”, “f2.if.x” and “f2.else.x”.

Looks good but it is not a solution for all cases:

In the case above, the identifiers would be “f”, “f.if.x” and “f.if.x”; breaking the golden rule!

We need another criterion to create unique identifiers. Relying on code block is not working.

Parsing the function body — branches

The solution is using the concept of branches. Branches are very similar to blocks but are more detailed. They only exist to help create unique identifiers.

Let’s see it working:

Excellent! Now the full name for each name registers its scope in an unequivocal way.

Different things have different names. OK. The problem is that the same thing (parameter “a”) got different names (“f.1.a”, “f.1.1.a” and “f.1.2.a”).

Don’t worry. We will handle this later.

Rat — the source code file object

This is the object that stores the data of a source code file:

The source code is stored in the form of right trimmed lines. This solves the problem of different possible end of lines (“\n” x “\r\n”) that often happens when we import text from different environment, say, Microsoft Windows or webpages (copy and paste).

The real token object

The token object that we saw before was a minimalistic one. This is the real one:

Some fields only have utility for tokens which kind is name.

To be continued

Here is the link for the fourth and last part of the tutorial.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store