Metaprogramming in JS: Write your first codemod!

Part 2: Modify and verify your codebase using jscodeshift

Kacper Kula
Onfido Product and Tech
8 min readAug 22, 2019

--

This is part two of the “Metaprogramming in JS” series. If you haven’t read the previous article, I recommend reading it first:

In part one we introduced jscodeshift, codemods, and abstract syntax trees (ASTs). We also saw some interesting examples of codemods. In this article we will create three codemods that can be used to improve JavaScript projects.

How to run a codemod

Before we write our codemod it’s important to know how we will run it.

To run a codemod, we first need to install jscodeshift from npm.

By using -g flag (equivalent to global in yarn) we install the package in our global node_modules rather than locally for the project.

Next, to run a codemod we can use the following command:

jscodeshift -t codemod.js source_file.js -d -p

We can specify our transformation (codemod) using the -t flag. Then we provide the file, list of files, or directory that we want to modify. Because jscodeshift modifies all the files in place by default, it’s important to use the -d -p flags when testing. The -d ‘dry-run’ flag runs the command without modifying the original files. The -p ‘print’ flag prints the generated code to the console so we can inspect the output.

Alternatively, we can execute codemods using AST Explorer online, which conveniently combines all the necessary views into one window:

  • original code (top left)
  • syntax tree (top right)
  • codemod you are applying (bottom left)
  • the resulting code (bottom right)

It is a really handy tool for learning and experimenting with jscodeshift.

AST Explorer in action.

Removing console.logs

After finishing a feature we usually spend some time cleaning up our code. This might involve renaming variables, removing unused code or removing all debugging statements — like console.log. The last can be automated with a fairly simple codemod!

We want to remove all instances of console.log

All the three console.log invocations above are correct. To remove them using regular expressions would be complex. However with codemods we can do it in just few lines of code:

A codemod must be a function that is the default export from the file. It accepts two arguments: the file descriptor and a libraries object it can use. By convention jscodeshift is usually destructured and assigned to j.

By calling jscodeshift on our source we compute an AST. We can then query it in the same way we query DOM elements in JavaScript using querySelectorAll.

The find method accepts two arguments: the first is the type of the node we are looking for, the second one is a query that allows us to narrow down the search to the specific instances. In our case we want to find CallExpression, which is how function calls are represented in our tree. We want to find only the calls that are performed on a property of the console object.

Our code will match not only console.log expressions but also console.warn and any other function calls on property of the console object. If you want to restrict it further, try analysing the tree from the previous article and modifying the query object we passed to find.

After we have successfully found all the matches, we can remove them from the tree using remove. This method modifies the original tree. Finally we call toSource because jscodeshift expects us to return a string containing the new code. A cool feature is that the original formatting is preserved for all the nodes we haven’t touched.

Automatically moving hard-coded colors to a design system

In modern apps we often use a design system as a central place for all styles rather than hard-coding values for colors, spacings and fonts. However during development it’s usually much easier style components directly, then migrate to design system tokens just before we are ready to raise a pull request. This is not only an error-prone task, but it’s also time-consuming. It could easily be automated!

In this example I will use a React Native StyleSheet object, but the solution can be adopted to work with any library that defines styles in JavaScript like Emotion or JSS (CSS in JS).

Let’s consider the following structure of design system tokens:

Example of Design System tokens object.

Using this object we can refer to the colors from our design system. Wherever we want to use a red color in our project we can refer to DESIGN_SYSTEM.tokenA.

Now let’s consider the following code in React Native:

Only the color red in the first call should be replaced. The other calls, even though they look similar, do not create a StyleSheet.

First let’s write a function that compares colors:

areTheSame function that compares two colors.

Because colors come in different formats, we can use a color library to convert any type to RGB. Then by comparing each color component (red, green, blue and alpha) we can determine if the colors are the same or not. In a real-life solution we could compare the colors using the CIELAB color space and match all colors that are indistinguishable to the naked eye.

Next we need to look for matches in our design system:

Finding matches in our Design System.

Finally we can write our codemod which will use these functions to automatically convert matching colors into design system equivalents:

You can find a full solution in this GitHub Gist.

First we find all the call expressions on StyleSheet.create. Then we can take all the arguments and extract all their properties. For each of argument we filter out all the properties that do not match "color". For the matching properties we try to find a token in our design system and if there is a match we modify the tree and change the value of the property. By calling j.memberExpression we create a new expression like DESIGN_SYSTEM.tokenA. Finally we convert our modified tree to source and return it as the result of the function.

Detecting hard-coded secrets in the code

We all know that we shouldn’t put tokens and secrets in our code. But mistakes sometimes happen. During the debugging process, you might put a token in the code to skip authentication. Or maybe someone less experienced will put a secret in the code without being aware of the consequences. It would be great to be able to detect hard-coded secrets in the code and raise an alert.

We would like to detect situations like this.

How can we detect if the string is a password or a secret key? We have many strings in our application which could be false positives. If we search by length, we would match all long strings which, but miss any short tokens. If we look for non-alphanumeric values, we could end up matching foreign language strings.

Fortunately, we can measure the entropy of a string. Entropy is a measure of an information conveyed by a certain piece of text (or in general piece of information). Intuitively— the bigger the entropy, the more random the information is. The theory behind it was first introduced by Claude Shannon in the 1948 paper A Mathematical Theory of Communication where he laid the groundwork for what has become the Information Theory field of studies.

Given tokens are supposed to be random and hard to guess, measuring entropy seems like a good place to start. In fact Auth0 uses exactly the same technique in their automated tools (with a few additional checks to exclude things like email addresses, URLs, and CSS selectors).

We can use a JavaScript implementation of the Shannon Entropy function:

GitHub Gist, based on solution by ppseprus from here.

The function counts the number of occurrences of each letter and computes a value in the range 0–8 (the proof is left as an exercise to the reader). The function returns small values for English words and sentences and larger values for randomly generated tokens and UUIDs.

Entropy value for some strings.

We can use this function in our codemod and compute the value for all the literals within our codebase:

Code available on GitHub Gist.

This codemod is really simple. We take all the Literals from our code and measure an entropy for each of them. If the entropy exceeds the threshold, we raise an exception.

Running codemods on CI

When running the codemod manually in the console we can clearly see if it fails or not. Unfortunately, when I tried to integrate jscodeshift with Continuous Integration tools (such as GitLab and Bitrise), I came across an issue. It turns out jscodeshift does not return an error code (non-zero exit code) when finishing with error. It’s actually a bug reported a while back which hasn’t been resolved yet. For now, we have to use a workaround and figure out the status ourselves.

The easiest way I’ve found was to write a simple Node.js script that parses the output of the jscodeshift and returns the proper exit code.

Verifying codeshift output: editable version on GitHub Gist

We can use this script by piping it with the jscodeshift command:

This way we will get proper error code.

To sum up, metaprogramming techniques are really useful for linting and automating repetitive tasks. At first glance codemods may seem quite odd, but once we understand what they operate on and how they work they become a really useful tool. I hope you find a use for them in your projects! If you do find a good use, please share what you did in the comments!

--

--