Static Analysis in JavaScript: A Technical Introduction

Published in

Codecademy Engineering

11 min readMar 4, 2019

…or as we affectionately call it: Nitpicking at Enterprise Scale!

**Chaos** by George Frederic Watts [source]. Your code without static analysis.

Remember when you could first get small programs to work while you were still learning to code? That first time you wrote a dozen lines of code and understood how they worked together?

If you’re still learning to code and haven’t gotten there yet, hang in there! You’re in for a treat and it’s so worth it!

Those wonderful moments of complete understanding become exponentially rarer as the amount of code you write increases. You might be able to fully understand how 10 files work together, but what about 100 files? 1,000? As our programs increase in size, we lose the ability to understand their intricacies. There’s no way a single person can fully understand every line of code in a large project, especially when they didn’t write most of it.

Ensuring code quality and program stability in larger projects requires use of programmatic tools to validate correctness. Those can come in several forms, such as automated testing, production crash reporting, or what this blog post covers: static analysis.

The following sections will cover, in order:

What is “static analysis” anyway?
Why use static analysis?
How does static analysis work?
Writing a simple ESLint rule
Next steps

What is “static analysis” anyway?

Great question! The shortest description we could come up with is:

Predicting defects in code without running it.

In a little more detail, static analysis is the art of trying to spot problems in your source code before it’s run. These can be small things, such as stylistic preferences or finding unused variables, or complex analyses such as detecting overly complex functions.

Many of us use static analysis without even knowing it! If you’ve ever written code in a statically typed language such as C++, C#, or Java (any language where you declare the types of objects), the language has some static analysis built-in. The type annotations written in code of those languages are used by the compilers to let you know if you’ve done something clearly wrong.

JavaScript code is dynamically typed (you don’t declare the types of objects), so instead of the compiler running significant static analysis, its most common forms of static analysis are formatters and linters.

Formatters

Formatters quickly scan and re-format your code files. They ensure your source code is consistent in how its formatted, such as which type of apostrophes are used, whether to use tabs or a specific number of spaces, and maximum preferred line width.

The most popular JavaScript formatter is Prettier. Like any good formatter, it will auto-fix any inconsistencies it finds (so useful!), can be disabled as needed per-line or per-file, and has a few options for how to format your code.

Prettier’s website has a pretty cute design. It even animates!

Linters

While formatters only work on formatting issues, linters can work on those as well as more complex issues. Linters scan source code with a set of “rules”, or descriptions of behaviors to watch out for, and let you know of any violations they find.

Linters allow you to customize and extend the list of rules they scan with. Some rules are pretty generic to most projects, while other rules can be framework-specific. React, for example, has a great set of ESLint rules you can add to your project.

Linter rules can always be disabled per line, file, or project; however, only some can be auto-fixed. They scan for complex behaviors that can’t always be solved with a simple solution. These complex behaviors could be problems such as uses of deprecated or unsafe APIs that can’t be easily swapped out.

JavaScript’s most popular linter by far is ESLint. It’s highly configurable, also integrates well with other tools, and many of its rules come with auto-fixers.

ESLint’s website logo isn’t quite as animated, but it still looks snazzy!

Why use static analysis?

Why Formatters?

When you live on your own or have your own room, it’s great: you can arrange things exactly how you like them. Adding roommates, though, means you have to deal with other people’s opinions on how to arrange things. You can either passive-aggressively wrestle over how to arrange the chairs… or settle on some standard for how to arrange the space.

The same holds true for formatting code. Who wants to take time constantly re-formatting everything? Consistently formatted code is easier to scan through and understand. Poorly formatted code takes longer to read, which is particularly irksome given how much time we as coders spend reading other people’s code while working as a team (a lot!). Both when on your own or when working in a group, it’s inconvenient not to have predefined standards for how code should look.

In either situation, a formatter benefits you by arranging your stuff for you. It might not get everything exactly how you like it, but it saves you the time of doing it yourself.

For a few more supporting points, see Prettier’s Why Prettier? page.

Why Linters?

One of the first lessons we learn in coding is: just because something compiles, doesn’t mean it works correctly. There are plenty of ways to write technically valid code that compiles but results in subtle errors upon execution.

Linters can’t catch all your bugs, but they can certainly find the small ones. They can let you know when you’re assigning a variable instead of checking its value, or forgetting to use something after creating it, or using the wrong type of loop, or a ton of other useful little checks.

ESLint’s community rulesets in particular are powerful. These are packages that add additional community-contributed linter checks for select use cases. For example, at Codecademy we use eslint-plugin-react to help enforce good React usage and eslint-plugin-jsx-a11y to help keep our pages accessibile.

You can read more about ESLint’s philosophy on ESLint’s About page.

How does static analysis work?

At the core of most popular static analysis libraries is the concept of an Abstract Syntax Tree, or AST for short. An AST is a representation of source code as a tree structure: each source file is a root node, and root-level constructs declared in the file are child nodes of that node. Those child nodes can each have child nodes within them.

For example, in this file…

console.log("Hello, world!");function doesNothing() { }class AlsoDoesNothing {
    constructor() {
        this.count = 0;
    }
}

…there are three root-level nodes. The first two each take up one line and the last takes up five lines.

Let’s explore the first node’s structure using the wonderful astexplorer.net. It’s an open source website that creates interactive visualizations of ASTs for a whole slew of languages, including JavaScript. Pop that open in another browser tab, enter console.log("Hello, world!") in the code input, and you’ll be able to follow along with this post using the AST output on the right.

The output on the right can be a little intimidating at first, so here’s a friendlier visualization of what it all means:

Tree representation of nodes corresponding to the AST generated for the first line of code.

The root console.log("Hello, world!"); is a node of type CallExpression. It consists of a “callee” (console.log) and an “arguments” array (of size one, containing "Hello, world!").
The console.log is a node of type MemberExpression; it consists of an “object” (console) and a “property” (log).
The console and log nodes are both of type Identifier.
The "Hello, world!" is a node of type Literal.

Ok, now we have this structured data representing the source file… but how do we use it? How do static analysis tools perform useful tasks with this information?

Formatters and ASTs

The earlier description of formatters was that they quickly scan and re-format your code files. A more precise description would be that they input your file as an AST, ignore most information about character positions of nodes, and print the AST back into the file.

Formatters ignore most of your original character positioning in order to standardize how the code ends up looking. The printing of the AST back into the file will work roughly the same regardless of how exactly the original code was formatted, so you end up with a standardized result.

For example, given the code we just looked at, Prettier would know to put the console.log( on one line, immediately place the "Hello, world!" argument after it, and the ) immediately after that. It would add a ; only if configured to add semicolons.

Linters and ASTs

To recap, linters run a set of “rules”, each of which looks for a particular bad behavior in code, and logs a complaint whenever it finds one. Complaints include position, message, and (optionally) a code change to fix the issue.

Each rule will receive the AST of a file in some way, and run its analysis on that file. For example, a lint rule might want to make sure no strings contain offensive words. That rule would look at every node of type Literal in a file, and if its value is a string containing some offensive word, report a complaint. We’ll go over an example of a lint rule later on.

ESLint’s Runtime

I’m not going to describe quite everything ESLint does, but this is the gist of it.

In order to run, ESLint roughly:

Loads user settings, such as settings for rules and rule plugins
Determines the list of files to visit
Visits each file to scan for complaints
Reports any complaints from visiting the files

In order to visit (scan) a file, ESLint:

Parses the file into an AST
Visits each node in the AST , giving any rule that cares about the node’s type an opportunity to check the node
Applies any fixes suggested by rules
Describes unfixed complaints back to the main runner

For more on ESLint’s architecture, see ESLint’s architecture guide.

ESLint rules

Each ESLint rule contains roughly two objects:

A meta object containing documentation on the rule’s name, behavior, settings, and other metadata.
A create method that returns an object with methods ESLint will call when visiting nodes.

The object returned by create maps node types to methods to call when visiting those nodes. For example, a rule checking for offensive words would return an object mapping "Literal" to a method that checks a Literal node for offensive words.

Fun fact: the ESLint rule runtime is an example of the Visitor Pattern. Its idea is to allow another process to manage visiting each portion of some set structure, such as nodes in a tree or items in a list; all you provide is the strategy for dealing with some or all of those items. In this case, ESLint manages visiting nodes in an AST; all you need to provide is functions that are called on particular node types.

Array.forEach is an example of the visitor pattern too! To use it, you provide a callback, which native code then applies to each member of the list for you.

For more on ESLint rules, see ESLint’s rules guide.

Writing a simple ESLint rule

Ok! We’ve talked about why you would want static analysis, and somewhat described how it works, but we haven’t looked at a real-world example of something useful you could do with it.

Checking string literals for offensive content was a common example earlier, but because this is a G-rated post, we’ll stick to a friendlier example. Let’s look at the use-isnan ESLint rule.

To recap a bit of JavaScript weirdness:

In JavaScript, NaN is a special value of the Number type[…] Because NaN is unique in JavaScript by not being equal to anything, including itself, the results of comparisons to NaN are confusing[…]
Therefore, use Number.isNaN() or global isNaN() functions to test whether a value is NaN.

You can peek at the rule’s documentation page or its source code if you’d like. This section will go over how the source code works.

Part One: Meta

Here’s the meta object exported by the rule (slightly shrunk for brevity):

meta: {
  type: "problem",
  docs: {
    description: "require calls to isNaN() when checking for NaN",
    category: "Possible Errors",
    recommended: true,
    url: "https://eslint.org/docs/rules/use-isnan"
  },
  schema: [],
  messages: {
    useIsNaN: "Use the isNaN function to compare with NaN."
  }
}

In order, the lines here are explaining:

The rule is meant to prevent an issue of type problem (rather than style)
Its description: it will “require calls to isNaN() when checking for NaN”
Violations of this rule fall under the category of “Possible Errors”
It is recommended that you enable this rule
Its documentation url is https://eslint.org/docs/rules/use-isnan
It may complain with the useIsNaN message if it finds a failure

Part Two: Create

Part one showed how to describe a rule. Now let’s implement it!

The implementation in the rule’s source file is a little dense at first. Before we dive in, here’s a reformatted version of create's contents all at once:

return {
    BinaryExpression(node) {
        if (
            /^(?:[<>]|[!=]=)=?$/u.test(node.operator) &&
            (node.left.name === "NaN" || node.right.name === "NaN")
        ) {
            context.report({ node, messageId: "useIsNaN" });
        }
    }
};

Going over it line by sections of lines…

return {
  BinaryExpression(node) {

This rule cares about nodes of type BinaryExpression. For example:

4 === "fish"

Per examining that code in astexplorer.net, a BinaryExpression node contains:

left: 4
operator: “===”
right: “fish”

    if (
      /^(?:[<>]|[!=]=)=?$/.test(node.operator) &&

Who doesn’t love complicated regular expressions?

This tests node.operator for "==","===","!=","!==","<=", or ">=".

Meaning: is the binary expression an equality comparison?

      (node.left.name === "NaN" || node.right.name === "NaN")

Knowing this is an equality comparison, are we checking against NaN?

    ) {
      context.report({ node, messageId: "useIsNaN" });
    }
  }
};

Oh no, we’re comparing against NaN! Report an error on the node. Recall that useIsNaN is listed under meta.messages.

In case it’s useful, here are some code snippets that would trigger a complaint:

NaN === NaN
age < NaN
NaN != 3

…and here are some ones that wouldn’t:

setAge(NaN): since this isn’t a BinaryExpression
age = NaN: since the operator isn’t checking for equality
age === value: since neither side is known to be NaN

Next Steps

I hope this article was helpful for learning about static analysis! We at Codecademy use it a lot to help write scalable, stable code, and hope it’ll be useful for you as well.

If you haven’t yet set up ESLint or Prettier, now is a great time to get started. Both of their websites have great getting started guides to help you set them up on existing or new projects.

If this is all old news for you, stay tuned for posts from us on more advanced forms of static analysis, such as more advanced lint rules and static typing!

By the way, did you know there’s a Codecademy podcast? It’s great! Download it on Google Podcasts, iTunes, or Spotify, and be sure to listen to the Static Analysis episode!

Acknowledgements

Many thanks to the brave proofreaders who sacrificed their time and health to read over this blog post! Jim Boulter, Anya Hargil, Emily Giurleo, and Jon Samp, you are all excellent individuals. 🙌