Rapidly Prototyping Engineering Dashboards and Performing Analyses using Jupyter Notebooks and Kibana
A step-by-step approach
Introduction
In a previous post, I described how you could build Engineering Dashboards using Jupyter Notebooks and Gitlab Pages (see [1]). That was useful if you knew what graphs or hypotheses you would like to plot/test. Unfortunately, unless you are really adept with Pandas and Matplotlib, that could be a struggle.
That got me thinking: if I could get the data into a platform like Splunk or even Kibana (ie. ELK) I could then perform analyses and also fairly easily plot graphs to ascertain trends. This is indeed what I set out to figure out and this is how I did it.
In the following sections, I will describe how I am able to do this step-by-step with 2 examples: (a) pulling SonarQube coverage data from SonarCloud and (b) pulling Gitlab pipeline data off of gitlab.com
. All the code I describe can be found on importELK.ipynb
on the Gitlab repo:
As in [1], we are using the following open source repos for demonstration purposes:
Running Kibana
For this example, I chose to run ELK as a docker container. You could also install it on your machine or use the Elastic Cloud but I wanted something simple. You can find the instructions to run the docker container here. Here’s the command line to run:
docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 -it --name elk sebp/elk
Next just go on your browser and load this URL: http://localhost:5601
and you’ll see the Kibana page.
Setting up for loading data into ELK
You should be able to find the code in this section on the importELK.ipynb
notebook.
The first few lines basically just install some needed Python libraries. The last loads the env vars from a .env
file.
Next, standard imports, getting the env vars, then instantiating the elasticsearch API object for use later.
Loading SonarQube data
Next, we pull data from SonarCloud and populate the data onto the local ELK:
This is exactly the same code described in [1]. We are pulling line and branch coverage data from the simgrid
repo.
Next, I start populating ELK with the data. Now notice the print(c)
on the 2nd line. I personally recommend that you print out the objects you want to populate ELK with first and take a look to see if certain elements are there: (1) is the object a JSON object? If not, you’ll have to package it into one and (2) is there a unique ID field? In this case yes, the id
field. This is important as you may want to run this multiple times for analyses and you want dupes in ELK which will mess up your analyses.
Next, since each c
is a JSON object I can just treat that as an elasticsearch document and create it on an elasticseach index (in this case sonarqube-code-coverage
.
Now once that is done, the next step is to see if it shows up on Kibana.
Setting up Kibana
The first thing you need to do is to create an index pattern. To do that, first go to the Home screen by clicking on the side menu and clicking on “Home”:
Next, click on the “Manage” link.
Then click on “Index patterns”
Then click on “Create index pattern”
On the page you should see a list of indices that already exist on your ELK instance. For example:
Choose an index pattern eg. “sonarqube-*” and click on “Next step” and then “Create index patter” you should be done.
Next, using the the side menu, navigate to Kibana -> Discover. On the left you will be able to change the index pattern. Select the index pattern you have just configured. You should then see something like this:
That means you have loaded the data correctly. Next you can do analyses of your data and build dashboards. I am not going to cover these in this article as there are other sources that describe this better. Here are some examples:
- Learn basics of Kibana KQL in 5 mins (https://youtu.be/cvPpElUEs4E)
- How to Create Visualizations and Dashboards in Kibana (https://youtu.be/iNxt0duWBZM)
Here is a simple dashboard I created showing the proportion of files that have branch coverage vs line coverage:
Loading Gitlab data
To load Gitlab data, there are some things to note. Similar to loading the SonarQube data, first you have to auth with Gitlab Cloud. You’ll need to get an access token as described in [1].
Next, you need to get the pipelines from the Gitlab project. This is also described in [1].
Next comes pulling the data from Gitlab into ELK:
I always print the object (in this case the failed job objects) so that I have an understanding of what I am dealing with. Notice that the structure for Gitlab objects is unlike that of SonarQube eg. the job object has 2 components, the “manager” and “attributes”. You can find more information on this in the Gitlab Python API (see this). What this means is that for each job object we have to do a job.__dict__['attrs']
to extract the JSON to populate ELK. Another thing to note is that you will want to create a timeseries. To do that you need to choose a timestamp to use for the @timestamp
attribute that ELK will use. In this case I chose to use the job.created_at
to use as timestamp.
Setting up Kibana
Setting up Kibana for this is almost the same as that for the SonarQube data except that once you defined the index pattern, Kibana is smart enough to figure out that you need to choose a timestamp field as you have time-series data.
Once done you should be able to see something like this:
Notice that it looks different from the SonarQube one. This is because you have time-series data now.
Again you can then create visualizations like so:
I highly encourage you to watch those YouTube videos listed above to get you started. They are really short so you should be building simple dashboards in no time.
I hope this was useful.
References
[1] Rapidly Prototyping Engineering Dashboards Using Jupyter Notebooks and Gitlab Pages, Jan 2021, Medium, https://medium.com/swlh/rapidly-prototyping-engineering-dashboards-using-jupyter-notebooks-and-gitlab-pages-c686f6b8f1fd