Rapidly Prototyping Engineering Dashboards Using Jupyter Notebooks and Gitlab Pages

Heemeng Foo
The Startup
Published in
8 min readJan 19, 2021

A step-by-step approach

Photo by Chris Liverani on Unsplash

Introduction

As engineering managers, we are always looking for ways to measure team productivity, identify obstacles team members are facing, coaching our teams on sound software engineering principles, developing engineering craftsmanship and keeping an eye on software quality. The larger the team, the bigger the challenge.

In the last decade or so, software for managing development have transitioned away from shrink wrapped or in-house hosted to SaSS (software-as-a-service) based. Tools such as Github, SonarQube, Jira have become managed services where you pay for what you use and you don’t have to incur the cost of installation, setup and maintenance. In addition to that, each tool has the freedom to specialize in their respective areas eg. Jira for bug management, Github for source code management etc. They have built tools eg. JQL (or Jira Query Language) to make it easy to build dashboards and reports.

But here’s the problem: as a manager, you need to put those disparate data points together so that you are able to obtain insights. For example, more bugs have been showing up in the development process and that correlates with less discussion during code reviews (at pull-requests or PRs) over the last few sprints. That is an issue that needs investigating. The problem is this: the bug information resides in the bug repository eg. Jira and the PR data resides in the code repository eg. Github.

The thing is, as a manager, you don’t want to pull out an engineer to build this for you as it will slow down the team. You would ideally want to build something on your own that doesn’t require too much effort to do so.

In the following sections, I will describe step-by-step how to build a simple dashboard that will help you pull together such data. All the code can be found here:

https://gitlab.com/foohm71/mylovelydashboard

To keep things simple, I am only interfacing with Gitlab cloud and SonarCloud. Also to demonstrate what’s possible I’ve pulled data from 2 open source repositories:

  1. https://gitlab.com/nomadic-labs/tezos-indexer
  2. https://sonarcloud.io/dashboard?id=simgrid_simgrid

You can also see the example dashboard here.

Prototyping a dashboard using Jupyter Notebook

I’m making an assumption that the reader is already familiar with Jupyter Notebooks. If you’re not, I would strongly recommend that you pick it up, together with some basic Python and data manipulation tools such as numpy, pandas as well as plotting libraries like matplotlib. The reason I say this is that I reckon that data-science will be a required skill in the next 5–10 years for any engineering manager.

In this section I will be walking through with you the notebook createDashboard.ipynb found in the Gitlab repo described above. Now the code in this notebook looks pretty decently organized but as you know about exploring APIs and building plots, there’s a lot of trial-and-error involved. This is where Jupyter notebooks shine because you can do just that. Don’t expect your first cut to look as organized and that’s perfectly OK.

The set up

In the notebook, the first thing you will see is this:

Just ignore this for now. Next you will see a bunch of pip install:

These are to install the API libraries to pull data off of SonarCloud and Gitlab. Next:

This is to install the mako library to generate the dashboard HTML. If you’re interested, you can find out more at: https://www.makotemplates.org/. Next:

This allows you to read off creds like your API tokens as ENV VARs. Store them locally in a .env file but make sure to include .env in your .gitignore as you do not want to store it in the clear on your git repo. Next:

This just lists all the Python libs you’re using so that it’ll be easier for you to create a requirements.txt file later on. Next:

Basically this tells Jupyter to read the .env file. As for the # END REMOVE we’ll get to that later.

The next section is all setup:

Basically we (i) perform all the imports, (ii) read the API tokens off .env and (iii) initialize values which is a Python dictionary to store all the data we are going to use to populate the mako template later.

Most tools nowadays expose an API but you’ll need to obtain an API token. The tool to create them are usually in the User Settings section of the UI. For example, for Gitlab it’s in the User Settings -> Access Tokens section. If you have trouble, just Google for it.

Next is the Useful Functions section. In there, there are the following:

  1. printJSON : this helps pretty print JSON strings
  2. collectGitlabMRStats : this reads in Gitlab MR (or merge requests ie. pull-requests in Github-speak) and returns data in a Pandas dataframe.
  3. collectSonarQubeMetrics : this reads in SonarCube metrics and returns data in a Pandas dataframe.

There are 2 other functions but they are just some common utility functions I use.

Creating the tables and charts

First we start with Gitlab stats. The first part just authenticates and gets the Gitlab project object:

You obtain the project ID from the Gitlab project when you access the main page of the project on Gitlab:

Notice I just print out the contents of the list to understand what data I can extract from the MR objects. This is usually necessary when exploring the API. You can Google for APIs for other tools eg. Jira, Github. The API doc for the Gitlab API I am using can be found here: https://python-gitlab.readthedocs.io/ . Next:

I use the collectGitlabMRStats function to extract the MR data into a dataframe and inspect it. It is always a good idea to inspect to see what you’re dealing with before further processing. Next comes the interesting part, I plot the duration (in hours) of MRs over time:

This is interesting as you will notice MRs took longer to be merged earlier in the year than later (sometime after July). Notice I saved the plot into a PNG file duration_mrs_time.png. This is for display in the dashboard later.

Next I plot the spread of number of comments per MR using a boxplot:

Notice there are a lot of outliers. Could they be correlated with the long duration MRs? You’ll need to plot the number of comments over time to check that.

Next I access SonarCloud to get the SonarQube stats.

This is similar to the Gitlab one. Here the component can be found on the SonarCloud UI in the Overview section like so:

Next I save the coverage in the values dictionary for display in the dashboard later.

I’m not going to cover the code to extract the coverage stats into a dataframe and the line coverage spread over files plot as they are similar to the Gitlab example.

Next is formatting the dataframe into a HTML table for display in the dashboard:

values['low_coverage_table'] = dfSonarQubeMetricsLineCoverageSmall.to_html(escape=False, classes="pure-table pure-table-bordered")

I’m using Pure CSS (see https://purecss.io/) for simple formatting hence the pure-table and pure-table-bordered. I also put the HTML table into the values dictionary.

Generating the dashboard

First we load in the template file template.html and then we render using the dictionary object values and write into the static HTML file dashboard.html.

So how does it look like on the template? Well for the generated PNGs, I just img src like so:

To display the HTML table:

The Mako templating engine is has more features but I won’t go into more detail here. Those familiar with JSP and PHP will be very comfortable with this.

Now once the dashboard.html is generated you can then inspect, troubleshoot and fix it to your satisfaction. You just need to re-run the loading of the template file and regenerate the dashboard page.

Publishing on Gitlab Pages

The next step is to publish the dashboard on Gitlab Pages. There are essentially ? steps:

  1. Set up the Gitlab repo
  2. Commit your .ipynb, template.html, copy over the requirements.txt, .gitlab-ci.yml and remove_lines.py
  3. Customize requirements.txt andgitlab-ci.yml for your project
  4. Make sure to configure your API tokens so that Gitlab knows to use them as ENV VARs. To do that, go to Settings->CI/CD->Variables

What’s remove_lines.py for? Well when we run the actual generation of the dashboard there are some lines of code in the notebook we do not want executed. Remember the # START REMOVE and # END REMOVE? Those are the boundaries of the code you want removed in the generated Python file we are using the generate the dashboard (more on that later). You can have as many START and END REMOVE blocks but they can’t be nested and every START must correspond to an END.

Now let’s look at the .gitlab-ci.yml cos that’s where the main logic happens:

The first part, the before_script is pretty boiler plate. You get that if you use a standard Python template.

Next is the pages section. This tells Gitlab you are generating a Gitlab Pages page. This is what is happening in sequence:

  1. Install required Python libraries found in requirements.txt
  2. Convert the .ipynb to a Python script using nbconvert
  3. Remove the unwanted lines of code using remove_lines.pyand save the output to create_dashboard.py
  4. Run create_dashboard.py
  5. Create folder public and copy the generated dashboard.html and all charts (ie. PNGs) there.

Lastly you just need to trigger a build by either manually triggering it or making a change eg. adding a file or making a change to a file.

One last thing: you may encounter issues with the build as requirements.txt may not have all the dependent libraries and each Gitlab build starts a fresh container. Remember the !pip list section in the notebook? Now’s the time to look for the missing library and add it into the requirements.txt and you should get your dashboard working in no time!

--

--

Heemeng Foo
The Startup

Test automation, Engineering management, Data Science / ML / AI Enthusiast