Codemod idea to make your codebase evergreen (part 1/2 — theory)

Published in

ING Blog

7 min readJul 28, 2021

Over time our codebase gets bigger. Technologies get newer. Code gets outdated. How to keep it up-to-date?

I’m Pavlik Kiselev, a front-end developer at ING at the Fraud prevention and investigation department. And I encounter the problem above almost on a daily basis. At ING we used to use Angular, then we migrated to Polymer and now we are moving to lit-html and lion-web (or styled version on top of it which is called ing-web). Of course, every new migration takes more time than the previous one because the amount of features we need to migrate is bigger. But does it mean that our codebase will always be outdated? Not at all! In the next couple of articles, I will explain how we can keep up with the migrations and feature creation.

The first article is more theoretical and introductory to the idea of codemod and the second article will be about the approaches I will take to actually make the migration happen.

What is a codemod idea? Everything we need!

Codemod is a tool/library/idea to assist you with large-scale codebase refactors that can be partially automated but still require human oversight and occasional intervention. It’s based on the idea of automatic transformation of the source code. There are a lot of transformers which are used almost daily. For example, Babel, which converts the EcmaScript 5 and further to old javascript, for it to run in old browsers.

Codemod takes the idea of transformation of the code and applies it to update the source code and save it. You have probably seen that when using ESLint with a --fixargument. When ESLint fixes the written code — this is a transformation. Codemod follows the same idea with a goal to change the whole codebase to a new syntax or interface. Some time ago, when EcmaScript 6 was released, there were plenty of codemods to transform the “old” JavaScript code (with “var”s, binds, etc.) to consts/lets, arrow functions, and such.

A simple example is to replace “var” in the code with “let” to make it modern. Of course, it would be better to replace it with “const” where it’s possible, but for the sake of simplicity, let’s leave “let” because it will always be working.

The naïve approach would be to do a 'string'.replace in the code, but that does not always work.

Why? Here are some examples:

Cases of the code when it’s not enough to use find/replace

So we need some more information about the code. Can we get it from somewhere?

What is AST? One stop shop!

In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code.

Okay, that seems like a representation of the code. We can work with it. How to get it from our code?

How to get AST? Build it!

Trees usually grow themselves. All you need is some piece of land, seeds, and water. Then, after a few years, you will have a tree.

However, when we are talking about an Abstract Syntax Tree, the process is a bit different. This fact is good in my opinion because waiting for a few years is a bit too much.

The first step of the process is converting a source code string to something more useful — a list of tokens. This process is called Lexical analysis or tokenization. A token is a meaningful group of characters.

Let’s take as an example var ing = 'awesome';If we group it like ["var i", "ng =", " 'awe", "so", "me';"] — each token is not very meaningful. If we group it ["var", "ing", "=", "'awesome'", ";"] it would make more sense. Especially, if together with a value of a token we add some extra information about what this token is:

List of tokens of “var ing = ‘awesome’;”

The list of these types is described in the ECMAScript standard.

We take a source code as an input: var ing = 'awesome'; Go through it character after character until we find any non-printable separator (space or newline): v, a, r, space. Now we determine the type according to the standard and continue. We check the resulting identifier against the list of known keywords: if, while, var, many others. Yes, it’s there — this is a keyword “var.” Otherwise, it would be just an identifier

The process, which goes after the tokenization, is a syntax analyzer. It goes over the list of tokens, validates the order and types, and transforms it to Abstract Syntax Tree.

The process of building an AST is not easy at all. That’s why it’s easier to use an existing library. Fortunately, there is quite a list of available options for us:

Esprima (https://www.npmjs.com/package/esprima)
Acorn (https://www.npmjs.com/package/acorn)
Babel parser (https://babeljs.io/docs/en/babel-parser)
and many more…

What can we do with AST? Modify it!

The most important part for us is to modify the syntax tree. And here the possibilities are only limited by our imagination. You can think about it in this way — if you just by looking at the code (as opposed to navigating and mentally executing it) know how it can be improved or changed — this can be done statically.

Some of the popular examples:

With a new version of your components library, the icons are now part of a separate NPM package. Update the import. Tick.
After adding the fifth argument to your function you decided that it’s time to convert arguments to an “options” object. Change the signature of the function. Tick.
It turns out that another team wants to extend your class of a custom element, but they don’t want to define the element. Extract the customElements.define call from the file with the class definition. Tick.
The creators of the library want to deprecate the usage of a function and replace it with another function. Rename the function. Tick.
You want to call a function from a library instead of a particular DOM event. Add an import and replace all calls of dispatchEvent(new CustomEvent('name of an event')) with a direct call of a function from the library. Tick.

Our favorite example of var ing = 'awesome'; gives us the following AST

AST representation of JavaScript code “var ing = ‘awesome`;”

On the path root.body[0].kind we have var which we need to change to let. That’s it.

What to do with the resulting AST? Save it!

I guess we all like sustainability and respect nature, and thus saving trees is one of our priorities. And ASTs are not an exception.

However, that’s not that easy as it seems. If we traverse the AST and convert each node to a JavaScript string, it will be a working line of JavaScript. Nothing wrong with this line; it’s just hard to maintain this code afterward.

Therefore it would be quite handy for us if the code can be nicely formatted. Of course, it does not have to be perfect, but it should be fixable to the perfect state.

To be able to do that, we need to collect and save some info from the codebase. Tabs or spaces, two or four, curly bracket on the same line or the next, single or double quote — decisions about the most controversial topics should be carefully collected to reproduce the source code close to the original.

How to approach the codemods? Simplify them!

Now when we know that the process consists of three steps (building AST, modifying AST, and converting it back to source code), we need to assemble a pipeline with the needed tools for these steps. Or use an existing one: welcome jscodeshift, a toolkit around the process described above by Facebook.

It contains everything we need

To build AST it uses a parser which we can modify: Babel, TypeScript, or JSX.
It has some nice utilities around modifying the AST.
It uses a library called recast to build the source code back.

On top of it, it helps to actually create the script by providing an --dry option for debugging and multithreading when rewriting the codebase. This is important when the codebase is huge. Otherwise, it can take hours.