ASTs & Codemods: Effective and Automated Codebase Refactoring

Published in

Bluecore Engineering

9 min readOct 1, 2020

Code refactoring is an important part of the process of evolving any software application, but large scale refactoring can be very time-consuming. Commonly used techniques such as find and replace or fancy regex have their own limits and restrictions depending on the complexity of the refactor. That’s where “codemods” come in. Codemods are scripts that modify code by manipulating the source code’s abstract syntax tree (AST). This blog discusses what ASTs are, how they’re used in everyday tools such as compilers, formatters and linters. We’ll also detail the process of writing a codemod that manipulates an AST and how we used ASTs and codemods at Bluecore to automate the long, tedious, and error-prone process of code refactoring with confidence.

What problem were we trying to solve?

Over the last five years, hundreds of thousands of lines of JavaScript have been committed to our main repo. With the evolution of the frontend web development landscape, new features in the ECMAScript specification every year, it can be difficult to keep up a codebase to modern standards while also adding new features and fixing bugs. As we dug more and more into the codebase, we discovered more chunks of legacy code that needed attention.

It’s easy to adhere to new best practices with the team and ensure that the new code is consistent with our style guide or code standards. But at a certain point, legacy code can get out of control and be difficult to maintain. Fixing and cleaning up this old code manually is tedious and isn’t work that most engineers like to do. It became apparent to us that we needed to fix this soon in order to reduce bugs, speed up development, and accelerate the ramp-up of new engineers. Thankfully, we discovered Facebook’s jscodeshift, which uses ASTs and codemods to help solve this problem.

What is an Abstract Syntax Tree (AST)?

If you are writing code, chances are that you are probably leveraging ASTs in one or more ways. An AST or Abstract Syntax Tree is a tree representation of code. ASTs are a formation of tokens generated from statements and expressions in a programming language. The source code that engineers write is typically piped through a parser and transformed into an AST. A compiler or interpreter can then use the AST to generate the corresponding lower-level machine code and evaluate instructions.

The typical flow of how an AST is created and used. — Diagram made with draw.io

When source code is converted to its AST representation, only the structure and content of the source code are kept; any other additional attributes such as punctuation and delimiters (braces, semicolons, parentheses, etc.) are thrown away. Information that’s preserved, which is needed for an AST, are:

Variable types and location of each variable declaration
The order of executable statements in which they defined and represented
Left and right operands of operations
Identifiers and their assigned values

So given these requirements, what does an AST look like?

An abstract representation an AST — Diagram made with draw.io

The above tree is an abstract representation of what a JavaScript AST would look like.

From the top level, we first start off with a variable declaration.
This variable declaration has a property node of vartype. (as opposed to let or const
This variable declaration also consists of an “identifier” and its corresponding assignment, in this case, a binary expression.
The binary expression is formed with a “left” and “right” literal and an operator.

This naming of each “node” is defined in accordance with the specification of the parser, but the idea of the AST is consistent throughout all parsers/programming languages.

A more human consumable format for the tree is typically in JSON format.

AST JSON Representation of var example = 5 * 9; — Image made with https://carbon.now.sh/

That’s a lot of JSON just for one line of code!

How are abstract syntax trees used in everyday tools like formatters and linters?

Now that the theory is covered, let’s examine how ASTs are used in our everyday tools. Since ASTs can be represented in JSON format, we can traverse it like any other normal JSON object. Having the ability to traverse an AST and visit nodes of the tree without actually executing it allows us to gain insights or perform operations on it. We can write code that parses, modifies, and outputs it back to a different tree which in turn will change the source code.

ESLint, a commonly used JavaScript linter, statically analyzes code by converting it to an AST and identifying and reporting on patterns by simply traversing the JSON. If needed, ESLint can be configured to modify the JSON to fix or change the pattern.

Babel, a JavaScript compiler, performs code modifications on ASTs that will convert ECMAScript 2015 and onwards code into a backward-compatible version of JavaScript for older browsers or runtime environments. This can mean turning React JSX into function calls or stripping out TypeScript or Flow type annotations before source code is bundled and reaches the browser.

Prettier, a popular JavaScript formatter restructures code lexically according to configurations without changing the content/meaning of the code. This is possible by converting the code into an AST which is formatting agnostic and then rewriting it based on certain rules.

What is a codemod?

Now that we understand how ASTs are used in the background of our development environments so frequently, how else can it be leveraged?

Codemod and jscodeshift are tools created by Facebook to assist in these large scale codebase refactors. Codemods are one-time scripts that leverage ASTs to allow developers to write code to transform code. jscodeshift is a toolkit for running codemods on large codebases and provides us with a unified API to easily find, modify, and replace the underlying AST nodes. Using these two together, we are able to do large scale refactors reliably and easily.

How does one write a codemod that manipulates an abstract syntax tree?

Let's have a simple import.js file that imports a few modules. There’s a new requirement to sort all imports in every single file in the codebase.

If done manually, this can easily turn into a nightmare if the codebase is large.

Automating this by writing a codemod to modify the underlying AST will make it much simpler.

We should first initialize the setup code for the transform.

This code imports the file, converts it to an AST, and then converts it back to source code.

We can add this script command to a package.json as well as the jscodeshift dependency.

package.json — Image made with https://carbon.now.sh/

Before doing anything, we should look at our AST to see what we are dealing with, we can do this by logging the parsed source code.

By examining the generated AST, we can see that there is a body node that contains an array for 4 elements; each of which areObjects of type ImportDeclaration. Since this is a regular JavaScript object now, all we would need to do is sort this array by the import name.

jscodeshift provides us with an API to help us to do AST traversals.

Find all imports in a file — Image made with https://carbon.now.sh/

Now that we have all the import declarations we can sort them.

Sort import declarations — Image made with https://carbon.now.sh/

After sorting, remove the old declarations so that they can be replaced with the new ones.

Replace old imports with new ones — Image made with https://carbon.now.sh/

And that’s it! We’ve now written a simple codemod that will sort imports in any files. We can run the npm script to see our results!

Please note, this is by no means a production-ready codemod. This codemod only sorts default imports and doesn’t handle named imports or imports that have both named or default imports but we can see how transforming code into a tree structure such as an AST can be very powerful and allows us to write code to change code very effectively.

How did we leverage ASTs at Bluecore?

We were able to use both open source and internally developed codemods to help us clean up our codebase.

Some of which included

We created our own funnel description language and parse that to an AST which we wrote scripts to auto-generate SQL — Blog Post
coffee-to-es2015-codemod:a codemod that helped finish our conversion from CoffeeScript to ES6
codemod-proptypes-to-flow:a codemod that converts runtime prototypes into statically typed Flow types
flow-to-ts: a codemod that converts a codebase from Flow to TypeScript
reselect (internal) — an internal codemod that automates TypeScript typings for reselect.
redux-actions to typesafe-actions (internal): an internal codemod that migrates all uses of redux-actions to the typesafe-actions API.

These are just a few of the refactors that were made possible with codemods. Looking for codemods and spending a little bit of time upfront to develop new ones saved our team hundreds of hours of valuable engineering time that we can spend writing new features or fixing bugs.

Some learnings and tips to conclude

Before writing your own codemod, make sure that it hasn’t been created. react-codemod and js-codemod are both fantastic resources for writing your own codemod. Better yet, open-source your codemod if you can’t find it online!
Use the AST Explorer. This is an amazing open-source tool that helps speed up development for codemods. It comes built-in with different parsers for different programming languages and a paste and drop editor that shows the AST and before/after code in one simple view. Writing codemods would be almost impossible without some way of visualizing code as an AST.
Refer to ast-types to see what types of nodes are used to create nodes. It serves as good documentation to make sure that the arguments that you are passing in are of the correct node type.
When first writing a codemod, have both the pre-transformed and transformed code and their respective ASTs opened at the same time, this way it is easier to compare the ASTs between the two to see what modifications are necessary, especially if the code is complex and long.
Do dry runs: due to the experimental nature of a codemod, it is important to get a quick feedback loop for tests, jscodeshift comes with a dry run feature that will print the transformed code into the console, allowing you to iterate more quickly.
Manual Intervention: especially when running codemods on production code, it’s imperative that you and a few other people on the team scan all files that were modified. Because we are writing code to change code, it’s easy to forget an edge case, which can cause bugs and undesirable effects.

In conclusion, codebase wide changes can be made simpler and less scary to approach. Using open-source codemods and a few ones we wrote internally, we were able to vastly improve the state of our codebase with minimal effort and low risk. Being able to write code that changes code is an amazing technique for any team to have in their toolkit!

ASTs & Codemods: Effective and Automated Codebase Refactoring

Written by Jason Deng