No-code Realtime DevOps Insights Dashboard

Build realtime DevOps insight dashboards from CI tools to Visualisation with no-code via Streamsets, Google BigQuery and Google Data Studio

Building DevOps realtime dashboards are sometimes more complex than building DevOps pipelines, and if implemented, dashboards often forgotten and lack realtime data, poorly maintained, require certain infrastructure to run on, has custom code to collect the data and clean it , limited to data sources due to the limited custom connectors and development cost grows exponentially with new cloud data sources and destinations.

The idea behind the this demo is to build a no-code, scalable, low maintenance and easily configurable dashboard.

In the example I will rely on an existing CICD tool, I picked Jenkins, I have other examples for CloudBuild and CodePipeline, in reality you can build dashboards with any tool that support rest APIs, where we will measure the status of a specific job (fail,sucess, abort, etc.) and display it on a pie chart.

Lets first start with the data ingestion tool, Streamsets provide a scalable open-source data ingestion and streaming layer, the tool can scale with Kubernetes as well and it has an intuitive UI so you can easily configure sources, which can vary from APIs, Databases , files, etc., manipulate data and target specific destinations.

You need to run the tool on an existing cluster (check this blog) or simply run it locally with Docker:

docker run -d -P --name streamsets-dc streamsets/datacollector

Streamsets 101:

Streamsets relies on three major stages:

1- Sources (Inputs)

2- Processors (To manipulate the data, which you can have as much as many)

3- Destinations (Outputs)

In our example the source is the CICD tool (Jenkins) API to get the job info, and perform fields cleanup and type conversion then output the result in Google Bigquery which we will use as a data warehouse.

API Source

Setup the HTTP client source with the proper URL and Auth to access the specific job, the tool will trigger an API call and retrieve the output.

Add a delay to reduce the API calls to your CICD tool, you don’t want to exhaust your resource especially when you have API quotas, it’s simply a pause, 1 min in my case, since it’s relative on how realtime is “realtime”.

Data Processors

Add a field keeper to get rid of the unnecessary fields the API returns, in our example we’re keeping the job id and status.

Convert the field types, for example the job id is considered as a string in the JSON retuned by the rest call, but we need to be an integer since I will be using it as partition key in BigQuery.

Setup Destination

We’re almost done, time to ship the data to our destination, you can pick any database, I’ve picked Google BigQuery since its easy to setup and integrate with, maintenance free and can easily integrate with DataStudio (just one click)

The documentation is straight forward, all we need is a GCP project and a service account file with a role to write to BigQuery

Required fields

  • The BQ Dataset and Table Name
  • Insert Id expression to have the jobs with the same Id to be updated otherwise Streamsets will perform inserts for each record

Prep BigQuery

Here’s what needs to be done on GCP- BigQuery side:

  • Add the fields name with the proper types.

Run the Streamsets pipeline:

Metrics for the pipeline performance will start populating with records errors, events, and the flow for each component will be measured in counts which can help identify bottlenecks in the streaming pipelines

Visualize your Data:

Simply click on Explore your Data !

You will be redirected to Google Data Studio:

DataStudio Configuration

I’ve picked the easiest thing!

A PieChart, you’re free to display your data with other charts, all you need is to pick the dimension, which is the jobs status values in our example, and the metrics that represent the jobs count.

Insights from the chart

As in insight looking at this specific job we just built, we can notice the devs are aborting the job, almost 75%, or in reality the job itself is self-aborting, due to the lack of notifications sent to the developers, when it requests a manual input, a solution would be a Slack bot notifying the dev about the job input request

Questions

If you have any questions or comments, I’ll be glad to read them in the comments. You can follow me on Medium or LinkedIn if you want to stay updated on my latest posts!

Software Engineer who lives by Cloud First thinking.