How we use git as a source of insights for our migrations

Published in

Inside Doctrine

5 min readJan 26, 2024

At Doctrine, we take care to ensure that the codebase is maintainable and can support evolutions. Sometimes this requires a migration, and we’ve carried out multiple since Doctrine was born.

Each time, the steps are the same:

identify a problem requiring migration
find the best solution and convince the stakeholders to do it
execute this vision and complete it within a reasonable timeframe

For the last point, it’s often the time when stakeholders regularly ask us to account for progress.

In this post, we’re going to see how we use a script to generate this part automatically and effortlessly over the months for migrating a server providing an API to a new technology.

image source: “Freepik.com”. This cover has been designed using assets from Freepik.com

Why gamify migration?

By gamification, we mean generating and publishing a report at a given frequency to gain perspective and see progress over time because it has several advantages:

Psychological: it can make the developer want to move the cursor forward.
Visibility: this shows the periods when the migration is moving forward and the periods when it is slowing down.
Very often there’s a good reason for slowing down, like working on other projects.
But seeing it over a longer time span helps to limit the frustration of not having completed the migration by realizing the changes in the pace of effort over time.
Knowledge sharing: as mentioned above, in any case, this information will be requested by stakeholders, so why not provide it in a graphical format that can also be used by developers.

Choose your metrics

In our case, we wanted to migrate from a Node.js server offering an API coded with Express.js to an API coded with Typescript with Nest.js.

So we started a fairly large migration plan to migrate our 5-year-old backend.

After a few weeks, when the first stakeholder asks us where we are, your reflex is to look in the codebase.

To answer this progression question, we can look at several metrics:

the number of lines of code in each version
the number of endpoints in each version

In our case, the migration involved a major overhaul, so the raw line count comparison is not the most accurate. We therefore chose to track the evolution of the endpoints count.

But you don’t want to have to search for this information every week and write it down manually somewhere.

Do it without wasting time

As we’re developers, we don’t want to repeat ourselves. Knowing that information on the presence of an endpoint in a version is stored in the codebase and historized in git, why not create an automated script?

That’s when we came up with a script that involves two parts:

the first browses the history of a repository defined as a parameter. The parameters also allow you to choose the step and start date of the analysis.
the second step is to run a script on the repo files to obtain the metrics. At this stage, you can do whatever you like depending on the migrations, for instance looking at the name or content of the files.

In our case of server migration, the information is contained in the controller files, so there’s no need to look at them all.

Once we’ve identified these files, we need to find out how many endpoints are declared in them.

We could have used a regex system to match patterns, but we preferred to use a more robust method.

Explore the Abstract Syntax Tree

The Abstract Syntax Tree (AST) is a tree data structure representing any structured text file, so every standardized syntax can be represented through an AST.

With this structure, you can go through each node and find out whether the code declares a variable or calls a method, etc.

The best way to learn how javascript code is converted to AST is to paste the code directly into an explorer like astexplorer.

Snippet example of a random javascript code parsed into AST

Since the new server is in Typescript, we will use the recast library to parse this type of code.

Once we know what we need to find in the code, all we need to do is create a unit test to iterate on the algo and avoid regression.

Initial version

For the Express part, we were lucky enough to be able to take advantage of an abstraction we’d used in the past to instrument error reporting. Basically, each endpoint is associated with a team that is notified in the event of an error.

You can then find the expression to target the element to be counted, here NewExpression with a callee named “Controller”.

Exploration of Express endpoint declaration

New version

For the new version, the code is in Typescript, but we can still get an AST.

You can then find the expression to target the element to be counted, here NewExpression with a callee named “Controller”.

Exploration of Nest.js endpoint declaration

With these two parts connected, our script is ready. If you want to dig into the code, you can find it in this: Github repository.

Sharing insights

In the first version of the script, the output was a CSV file. All you had to do was import this information into a spreadsheet, then generate the graph from the columns.

As we said earlier, the less manual action, the better.

We therefore opted for an evolution where the script generates a web page displaying the information directly in a graph using the D3.js library.

And here’s the final result:

Conclusion

There’s nothing rocket science involved in this script, and it’s not optimized, but it’s really useful for measuring our progress without wasting time.

Don’t hesitate to use it as inspiration or to fork it so that you can apply it to your own use case.

If you are interested in helping us with our future challenges, please feel free to apply for our job offers.