Rapidly Prototyping Engineering Dashboards and Performing Analyses using Jupyter Notebooks and Kibana

Heemeng Foo
Feb 23 · 6 min read

A step-by-step approach

Image for post
Image for post
Photo by Carlos Muza on Unsplash

Introduction

In a previous post, I described how you could build Engineering Dashboards using Jupyter Notebooks and Gitlab Pages (see [1]). That was useful if you knew what graphs or hypotheses you would like to plot/test. Unfortunately, unless you are really adept with Pandas and Matplotlib, that could be a struggle.

That got me thinking: if I could get the data into a platform like Splunk or even Kibana (ie. ELK) I could then perform analyses and also fairly easily plot graphs to ascertain trends. This is indeed what I set out to figure out and this is how I did it.

In the following sections, I will describe how I am able to do this step-by-step with 2 examples: (a) pulling SonarQube coverage data from SonarCloud and (b) pulling Gitlab pipeline data off of gitlab.com. All the code I describe can be found on importELK.ipynb on the Gitlab repo:

https://gitlab.com/foohm71/mylovelydashboard

As in [1], we are using the following open source repos for demonstration purposes:

  1. https://gitlab.com/nomadic-labs/tezos-indexer
  2. https://sonarcloud.io/dashboard?id=simgrid_simgrid

Running Kibana

For this example, I chose to run ELK as a docker container. You could also install it on your machine or use the Elastic Cloud but I wanted something simple. You can find the instructions to run the docker container here. Here’s the command line to run:

docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 -it --name elk sebp/elk

Next just go on your browser and load this URL: http://localhost:5601 and you’ll see the Kibana page.

Setting up for loading data into ELK

You should be able to find the code in this section on the importELK.ipynb notebook.

Image for post
Image for post

The first few lines basically just install some needed Python libraries. The last loads the env vars from a .env file.

Image for post
Image for post

Next, standard imports, getting the env vars, then instantiating the elasticsearch API object for use later.

Loading SonarQube data

Next, we pull data from SonarCloud and populate the data onto the local ELK:

Image for post
Image for post

This is exactly the same code described in [1]. We are pulling line and branch coverage data from the simgrid repo.

Image for post
Image for post

Next, I start populating ELK with the data. Now notice the print(c) on the 2nd line. I personally recommend that you print out the objects you want to populate ELK with first and take a look to see if certain elements are there: (1) is the object a JSON object? If not, you’ll have to package it into one and (2) is there a unique ID field? In this case yes, the id field. This is important as you may want to run this multiple times for analyses and you want dupes in ELK which will mess up your analyses.

Next, since each c is a JSON object I can just treat that as an elasticsearch document and create it on an elasticseach index (in this case sonarqube-code-coverage.

Now once that is done, the next step is to see if it shows up on Kibana.

Setting up Kibana

The first thing you need to do is to create an index pattern. To do that, first go to the Home screen by clicking on the side menu and clicking on “Home”:

Image for post
Image for post

Next, click on the “Manage” link.

Image for post
Image for post

Then click on “Index patterns”

Image for post
Image for post

Then click on “Create index pattern”

Image for post
Image for post

On the page you should see a list of indices that already exist on your ELK instance. For example:

Image for post
Image for post

Choose an index pattern eg. “sonarqube-*” and click on “Next step” and then “Create index patter” you should be done.

Next, using the the side menu, navigate to Kibana -> Discover. On the left you will be able to change the index pattern. Select the index pattern you have just configured. You should then see something like this:

Image for post
Image for post

That means you have loaded the data correctly. Next you can do analyses of your data and build dashboards. I am not going to cover these in this article as there are other sources that describe this better. Here are some examples:

  1. Learn basics of Kibana KQL in 5 mins (https://youtu.be/cvPpElUEs4E)
  2. How to Create Visualizations and Dashboards in Kibana (https://youtu.be/iNxt0duWBZM)

Here is a simple dashboard I created showing the proportion of files that have branch coverage vs line coverage:

Image for post
Image for post

Loading Gitlab data

To load Gitlab data, there are some things to note. Similar to loading the SonarQube data, first you have to auth with Gitlab Cloud. You’ll need to get an access token as described in [1].

Image for post
Image for post

Next, you need to get the pipelines from the Gitlab project. This is also described in [1].

Image for post
Image for post

Next comes pulling the data from Gitlab into ELK:

Image for post
Image for post

I always print the object (in this case the failed job objects) so that I have an understanding of what I am dealing with. Notice that the structure for Gitlab objects is unlike that of SonarQube eg. the job object has 2 components, the “manager” and “attributes”. You can find more information on this in the Gitlab Python API (see this). What this means is that for each job object we have to do a job.__dict__['attrs'] to extract the JSON to populate ELK. Another thing to note is that you will want to create a timeseries. To do that you need to choose a timestamp to use for the @timestamp attribute that ELK will use. In this case I chose to use the job.created_at to use as timestamp.

Setting up Kibana

Setting up Kibana for this is almost the same as that for the SonarQube data except that once you defined the index pattern, Kibana is smart enough to figure out that you need to choose a timestamp field as you have time-series data.

Once done you should be able to see something like this:

Image for post
Image for post

Notice that it looks different from the SonarQube one. This is because you have time-series data now.

Again you can then create visualizations like so:

Image for post
Image for post

I highly encourage you to watch those YouTube videos listed above to get you started. They are really short so you should be building simple dashboards in no time.

I hope this was useful.

References

[1] Rapidly Prototyping Engineering Dashboards Using Jupyter Notebooks and Gitlab Pages, Jan 2021, Medium, https://medium.com/swlh/rapidly-prototyping-engineering-dashboards-using-jupyter-notebooks-and-gitlab-pages-c686f6b8f1fd

The Startup

Medium's largest active publication, followed by +771K people. Follow to join our community.

Heemeng Foo

Written by

Test automation, Engineering management, Data Science / ML / AI Enthusiast

The Startup

Medium's largest active publication, followed by +771K people. Follow to join our community.

Heemeng Foo

Written by

Test automation, Engineering management, Data Science / ML / AI Enthusiast

The Startup

Medium's largest active publication, followed by +771K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store