Introducing Repo Analyzer

Henrique Dias
Feedzai Techblog
Published in
5 min readJan 9, 2020

Feedzai has several frontend repositories, but keeping tabs on them to ensure they are high-quality and consistent has proven difficult.

To improve this, I developed a tool during my summer internship that allows Feedzai to collect metrics regarding code quality and consistency across several projects. These metrics are then sent to Elasticsearch/Kibana where it’s possible to create dashboards with the collected data.

Kibana dashboard with data collected from Feedzai’s projects

Motivation

There wasn’t any open-source tool (that we knew of) that could collect quality metrics about frontend projects. With this in mind, we decided to build a tool that could do that.

We wanted this tool to be usable, extensible, and provide visual insights. After finding this missing piece in the open-source community, we decided to roll up our sleeves and get started.

High-level idea

First, we thought about following a centralized architecture responsible for analyzing a group of projects. Later we found out that this was not the best approach.

Inspired by ESlint, we decided to build this tool as a project dependency. This way we could achieve a simpler and friendlier usage.

In its current form, the tool is divided into 3 packages:

  • Tool
  • Configurations
  • Utilities (used in the tool/configuration)

Only the repo-analyzer and utilities are required to run the tool. The configuration can be defined by the user if they create their own configuration package. I’ll explain this in detail later.

There are two ways of running the tool: in current mode or in history mode. The current mode, as it implies, analyzes the project’s current state.

The history mode copies the entire project to a temporary folder. After that, it scrubs through its commits and runs the tool in each commit. This provides the project with a complete history report.

However, running the tool in each commit can consume a lot of time and resources. To overcome this, we introduced a way to jump commits so as not to affect the overall results significantly. In addition, we decided to increase the sample size as we go back in the commit history. This way we ensure that the most important data, i.e. the most recent one, will be collected.

The following chart illustrates the results of jumped commits:

Usage

Analyzing the project’s current state

In the next steps, we show how we use the tool to analyze the current or previous state of a project. However, this tool is best if used with Elasticsearch/Kibana.

Due to the complexity of Elasticsearch and Kibana’s configuration, I decided to write another document about it, which is available here.

Install the tool which is available at NPM and save it as a project dev dependency:

$ npm i @feedzai/repo-analyzer — save-dev

In a first run, the tool will create a configuration file called .repo-analyzer, the one we use at Feedzai. It consists of 14 metrics, such as code coverage, bundle size and many more. You can learn more about them here. You can also find more information on how to create your own metrics here.

The best practice is to run the tool locally by adding an entry to the scripts section inside your package.json:

“scripts”: {

“start”: “start-storybook -p 8000”,

“test”: “jest — coverage”,

“test:watch”: “jest — watch”,

“build”: “npm run clean && npm run build:webpack && npm run build:scss”,

“analyze”: “repo-analyzer — username=john.doe — password=password”

},

The username and password flags are used to log in to Elasticsearch and send the reports. Find more information about this here

After correctly installing and configuring the tool, you are ready to go! By simply running it, you get the current status of your project.

You can run the tool like this:

$ npm run analyze

Analyzing the project history

The tool is capable of collecting data about the project‘s history based on its commits. This may come in handy if you want to know how your project varied over time, and, for example, what impact some dependencies may have made in your bundle size.

You can also pass the flag through NPM. You need to put “ — ” before your flags so that NPM knows that it needs to pass them to the script.

$ npm run analyze — — history

By default, the tool will analyze every single commit, but you can change this by passing the “factor” flag.

$ npm run analyze — — history — factor=2

The factor flag allows you to calculate the history in repositories with many commits when it is not possible to analyze every single commit. When you specify a factor of 1 or larger, the commits will be sampled. The larger the value passed through the flag, the larger the samples will be.

As the Commits Jumped graph previously illustrates, the number of jumped commits increases exponentially as we go back in the commit history. You can reduce the jumps’ size by using values between 0 and 1.

Testing with react

To prove its power, I decided to test this tool with React.

After getting the results, I realized that sometime in the last year, npm packages were switching places between “dev” and “regular” project dependencies. Even after some research, I couldn’t reach a conclusion on what caused this.

When you zoom in on the timestamps, it looks even more strange. Can someone from the React team explain this phenomenon? This is not a rhetorical question :smile_face:.

Conclusion

This project was built during my summer internship at Feedzai. This tool is currently being used to monitor our own internal frontend projects.

We have seen great results and discovered outdated react versions in several projects. As a matter of fact, we were able to clearly see the impact on the project’s bundle size while removing NPM dependencies.

The main point for improvement is that there are still metrics h we don’t support in all our internal projects. We would like to add support for those, especially the bundle size, which is a very important metric that was not supported in a key project.

--

--