Automated JavaScript static analysis

Mark Ablovatskii
Motorway Engineering
4 min readSep 10, 2021

Static analysis is powerful tool that can help solve different tasks. In particular:

  • It allows you to understand your code better
  • Build dependency graphs between modules
  • Gather statistics about function usage
  • Compute relevant code metrics like average function size or any other metrics that you think could be useful

The JavaScript ecosystem contains all the necessary libraries to help with this. You don’t need to write ECMAScript parser on your own — you can grab the parser by your choice. So to perform static analysis of code you just need to describe rules over abstract syntax tree (AST), like if we see Identifier, then increase identifierUsed variable.

In this article I will show how we automated one plane of refactoring — signature change of function from an internal library.

At Motorway we have fleet of microservices. Many of them have a shared infrastructure layer, that we’ve encapsulated into common libraries. What if we want to update signature of one of the function from such common library? Then we have few options:

  1. Semi-manual find & replace. That would work well if you can specify clean pattern of such operation, e. g. to replace foo(param) with foo({ bar: param }). But not every case fits that.
  2. Manual find & replace. Would work for small amount of function calls, but what if we have 1000+ references?
  3. Dynamic analysis. Just add log inside function with caller name and params. After some time we can understand how this function is used and what projects should be adjusted to new signature. Again, would work for small amount of function calls, matched criteria for change (note difference, that we would have to check only matched calls, rather than all calls in case of manual find). Another problem is that some functions could be called pretty rare and we don’t want to wait month or year (some annual reports?) to gather enough data.
  4. And here it is a preferable (IMHO) option — automated analysis of source code. We specify necessary pattern, that could be very flexible, because it’s written on programmatic language instead of regular expressions in best case for semi-automatic approach.

To be even more concrete, task would be to find calls with more than 1 argument of any method, imported from library mw-lib-foo.

First step would be to fetch all projects to perform code analysis. We are not using mono-repository, but follow project-per-repository approach. That’s why easiest way to have whole code for analysis would be to use GitHub/GitLab/… API.

Structure of script is pretty simple — grab a list of repositories for organization, filter them by some criteria, and execute few asynchronous workers (not WebWorkers, even if it would increase performance drastically) in parallel. Every worker would download package.json to determine if project should be analyzed (it imports necessary library) and then downloads whole source tree and run ECMAScript analyzer.

In terms of code:

Worker also is very straight-forward:

Now we can start to check JavaScript files one-by-one to gather necessary information. I have chosen @babel/parser as ECMAScript parser, it's powerful enough and supports modern editions of ECMAScript (and all proposals, that are used by us).

API of parser is very simple — you call parse() and then you retrieve AST, which closely matches ESTree specification with few nuances ( https://babeljs.io/docs/en/babel-parser#output).

Based on that, high-level code to generate syntax tree would be:

We want to be sure that all calls are recognized correctly and we have not missed any usage of method from library mw-lib-foo. That's why findAllUsageForModuleMethods() returns all mentions of variable, associated with target module. We either can extend our AST processor or to check that usage manually.

After source code load and parse, next step would be to process AST. It’s a regular tree structure, so that’s not a big problem to perform traversal, but to simplify life, we can use @babel/traverse, which provides some more context for each node, and allows you to setup hooks on specific token occurrences.

Returning to our task, firstly we need to detect when library is loaded and need to create list of variables/methods associated with that library.

Let’s assume, that we are using CJS modules, and all imports are done through require('mw-lib-foo') calls. To find that imports we need to analyze CallExpression with callee name as string require

As outcome of analysis of that part, we want to retrieve list of associated variables with that module. It could be either const lib = require('mw-lib-foo') or straight decomposition const { methodFoo, methodBar } = require('mw-lib-foo'). To track usage of library methods we have to support both forms of imports.

OK, now we have list of variables, that could be used to call method from library, time to track this calls:

Now task is almost solved, just need to verify that we have not missed anything:

Static analysis of source code is powerful and flexible tool, that could be heavily used for reliable refactoring. Even if you have good test coverage, it would not guarantee that you can’t miss some very rare call.

From other side, with @babel/parser to write code for AST analysis is pretty simple and straightforward.

Hopefully this article has given you some pointers and tips about adding static analysis into your projects!

--

--