Rapidly Prototyping Engineering Dashboards and Performing Analyses using Jupyter Notebooks and Kibana

Heemeng Foo
The Startup
Published in
6 min readFeb 23, 2021

A step-by-step approach

Photo by Carlos Muza on Unsplash

Introduction

In a previous post, I described how you could build Engineering Dashboards using Jupyter Notebooks and Gitlab Pages (see [1]). That was useful if you knew what graphs or hypotheses you would like to plot/test. Unfortunately, unless you are really adept with Pandas and Matplotlib, that could be a struggle.

That got me thinking: if I could get the data into a platform like Splunk or even Kibana (ie. ELK) I could then perform analyses and also fairly easily plot graphs to ascertain trends. This is indeed what I set out to figure out and this is how I did it.

In the following sections, I will describe how I am able to do this step-by-step with 2 examples: (a) pulling SonarQube coverage data from SonarCloud and (b) pulling Gitlab pipeline data off of gitlab.com. All the code I describe can be found on importELK.ipynb on the Gitlab repo:

https://gitlab.com/foohm71/mylovelydashboard

As in [1], we are using the following open source repos for demonstration purposes:

  1. https://gitlab.com/nomadic-labs/tezos-indexer
  2. https://sonarcloud.io/dashboard?id=simgrid_simgrid

Running Kibana

For this example, I chose to run ELK as a docker container. You could also install it on your machine or use the Elastic Cloud but I wanted something simple. You can find the instructions to run the docker container here. Here’s the command line to run:

docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 -it --name elk sebp/elk

Next just go on your browser and load this URL: http://localhost:5601 and you’ll see the Kibana page.

Setting up for loading data into ELK

You should be able to find the code in this section on the importELK.ipynb notebook.

The first few lines basically just install some needed Python libraries. The last loads the env vars from a .env file.

Next, standard imports, getting the env vars, then instantiating the elasticsearch API object for use later.

Loading SonarQube data

Next, we pull data from SonarCloud and populate the data onto the local ELK:

This is exactly the same code described in [1]. We are pulling line and branch coverage data from the simgrid repo.

Next, I start populating ELK with the data. Now notice the print(c) on the 2nd line. I personally recommend that you print out the objects you want to populate ELK with first and take a look to see if certain elements are there: (1) is the object a JSON object? If not, you’ll have to package it into one and (2) is there a unique ID field? In this case yes, the id field. This is important as you may want to run this multiple times for analyses and you want dupes in ELK which will mess up your analyses.

Next, since each c is a JSON object I can just treat that as an elasticsearch document and create it on an elasticseach index (in this case sonarqube-code-coverage.

Now once that is done, the next step is to see if it shows up on Kibana.

Setting up Kibana

The first thing you need to do is to create an index pattern. To do that, first go to the Home screen by clicking on the side menu and clicking on “Home”:

Next, click on the “Manage” link.

Then click on “Index patterns”

Then click on “Create index pattern”

On the page you should see a list of indices that already exist on your ELK instance. For example:

Choose an index pattern eg. “sonarqube-*” and click on “Next step” and then “Create index patter” you should be done.

Next, using the the side menu, navigate to Kibana -> Discover. On the left you will be able to change the index pattern. Select the index pattern you have just configured. You should then see something like this:

That means you have loaded the data correctly. Next you can do analyses of your data and build dashboards. I am not going to cover these in this article as there are other sources that describe this better. Here are some examples:

  1. Learn basics of Kibana KQL in 5 mins (https://youtu.be/cvPpElUEs4E)
  2. How to Create Visualizations and Dashboards in Kibana (https://youtu.be/iNxt0duWBZM)

Here is a simple dashboard I created showing the proportion of files that have branch coverage vs line coverage:

Loading Gitlab data

To load Gitlab data, there are some things to note. Similar to loading the SonarQube data, first you have to auth with Gitlab Cloud. You’ll need to get an access token as described in [1].

Next, you need to get the pipelines from the Gitlab project. This is also described in [1].

Next comes pulling the data from Gitlab into ELK:

I always print the object (in this case the failed job objects) so that I have an understanding of what I am dealing with. Notice that the structure for Gitlab objects is unlike that of SonarQube eg. the job object has 2 components, the “manager” and “attributes”. You can find more information on this in the Gitlab Python API (see this). What this means is that for each job object we have to do a job.__dict__['attrs'] to extract the JSON to populate ELK. Another thing to note is that you will want to create a timeseries. To do that you need to choose a timestamp to use for the @timestamp attribute that ELK will use. In this case I chose to use the job.created_at to use as timestamp.

Setting up Kibana

Setting up Kibana for this is almost the same as that for the SonarQube data except that once you defined the index pattern, Kibana is smart enough to figure out that you need to choose a timestamp field as you have time-series data.

Once done you should be able to see something like this:

Notice that it looks different from the SonarQube one. This is because you have time-series data now.

Again you can then create visualizations like so:

I highly encourage you to watch those YouTube videos listed above to get you started. They are really short so you should be building simple dashboards in no time.

I hope this was useful.

References

[1] Rapidly Prototyping Engineering Dashboards Using Jupyter Notebooks and Gitlab Pages, Jan 2021, Medium, https://medium.com/swlh/rapidly-prototyping-engineering-dashboards-using-jupyter-notebooks-and-gitlab-pages-c686f6b8f1fd

--

--

Heemeng Foo
The Startup

Test automation, Engineering management, Data Science / ML / AI Enthusiast