GitHub Enterprise Insights with Elasticsearch and Kibana

14 min readOct 4, 2023

As an Expert Services engineer at GitHub, I have the privilege of working with 1,000’s of our users. From developers who are just starting out, to project managers who have never written a line code, to developers who have changed the landscape of our field forever. There is no shortage of days that still amaze me at what I can learn from our users. But the highlight of my job is every opportunity I get to teach our users a little bit more about our platform and its features.

Most GitHub users are unaware of the sheer volume of data generated by their repositories, their organizations, and their enterprise. Even modest organizations can generate millions of events daily. Nearly every Git action, web UI interaction, and API call generate one or more events on the GitHub backend. These events can be configured to generate webhooks to third-party systems. Many of us are familiar with this from some very common use cases, such as triggering Jenkins jobs, or even triggering security scans in third-party SAST tools. But the data contained in GitHub webhooks is so much more powerful than these simple use cases.

Our users constantly ask us about building CI dashboards, identifying important organization metrics (DORA), or even finding rogue actors in their enterprise. And, while our engineering teams at GitHub continue to invent new and exciting ways to do these things, the fact is, there is no one size fits all solution. So it begs the question, how do GitHub users extract the data they need to make development and business investment decisions in a way that is scalable and able to provide data in realtime.

Enter webhooks…

What is a webhook

Think of things like pushing code to your repository, triggering an Actions run, or even adding a new member to your GitHub organization. All of these events have a object representation, generally using JSON. And all of these JSON event objects can be sent to third-party tools via a system known as webhooks. So, in a nutshell, aGitHub webhook is a JSON object that represents an event that occurred on GitHub and is sent to a third-party.

GitHub webhooks provide the ability to scale to a near infinite ceiling. This stands in direct contrast to the traditional way our users are familiar with retrieving GitHub data, which is via the GitHub API. While the API is great for many use cases, any software engineer knows, most systems based on polling have their limits, and the GitHub API is no exception. Users are limited on the amount of data they can extract from the API via an hourly rate-limit, and some events, in large organizations, occur so rapidly, that it is impossible to query them at a rate faster than they are generated. And even worse, some questions you want to answer via the API require two or more queries per data point, slowing down your ability to generate meaningful metrics.

So webhooks provide us a scalable way to capture events and data occurring in GitHub.

How do I capture and store webhooks

While webhooks allow us to capture GitHub events in real-time, they don’t come without their own complexities. For us to build persistent tools, such as monitoring and alerting systems or metrics dashboards, we need a way to aggregate these webhooks and a place to store them. This generally involves standing up a storage system, such as a database, then writing an application and hosting it somewhere to receive the webhooks and add them to our storage system.

So, how can we do this? While I’ve personally explored many ways of receiving, processing, storing, and displaying this data, my personal favorite remains ElasticSearch and Kibana. The rich features of ElasticSearch and Kibana, coupled with the ElasticSearch Operator for Kubernetes make getting up and running and building your first dashboard a breeze.

But don’t worry, we won’t be getting that technical today, for the rest of this tutorial all you will need is Docker, Docker-Compose, and ngrok installed on your local machine.

So, let’s build a system for aggregating our GitHub webhooks to ElasticSearch and build our first dashboard using Kibana together.

Setting up ElasticSearch and Kibana

Fortunately for us, in the modern world of software development, so much of our development is simplified by Docker. For this tutorial, we will leverage Docker-Compose to deploy our ElasticSearch cluster, our Kibana instance, and our webhook server. To get started clone the repository I’ve created for this demo from GitHub: lindluni/github-webhooks-with-es

Once you’ve cloned the repository, open the .env file in the docker directory, which will be used to inject values into our Docker containers, and edit the following fields:

ELASTIC_PASSWORD=<your_password_here>
KIBANA_PASSWORD=<your_password_here>
WEBHOOK_SECRET=<your_secret_here>

Now, the PASSWORD fields should be pretty self-explanatory, just set them to some sufficiently strong password, but, what is the WEBHOOK_SECRET field for?

Each time GitHub sends a webhook to your server, it includes a special http header named x-hub-signature-256. This signature is a SHA256 hash of your webhook payload (that JSON object we talked about earlier). The signature was generated using a “secret” value we will provide when we set up our webhooks. Since we provide the secret to GitHub, this means both GitHub and our webhook server know its value, which means we can also hash the webhook payload to generate the SHA256 signature and compare our signature to the signature provided by GitHub. And all of this allows us to validate that the data we received came from GitHub and was not spoofed by an outside actor, assuming we’ve generated a sufficiently secure secret.

To generate a random secret, use a tool such as Random.org, or use the openssl tool in your terminal as follows: openssl rand -hex 32

This will generate a 32 character alphanumeric random string. Once you have that value, add it to your .env file, we will use this value again later when configuring our webhooks on GitHub.

Now that we’ve created our .env we can prepare to deploy ElasticSearch, Kibana, and our webhook server. Navigate to the docker directory and inspect the docker-compose.yml file. While this may look unfamiliar to you, Docker-Compose is a tool for orchestrating Docker containers, allowing us to define our containers in YAML templates, as opposed to storing them as raw CLI commands in Bash scripts. If you read through it, you will see that all of the values are generally representative of flags you can specify to the docker CLI.

If you inspect the file, you will find several components:

setup — This container is used to generate the certificates needed by ElasticSearch, Kibana, and our webhook server
es01, es02, es03 — These are our ElasticSearch nodes and is where our data is stored and queried from
kibana — As the name implies, this is our Kibana node and is where you will build rich dashboards
webhooks — This is the server that receives our webhook events and send them to ElasticSearch for storage

You’ll also notice that the webhook server points to a Dockerfile in the webhook-server directory, and that the Dockerfile points to the file index.js in the same directory. You can take this time to inspect to the source code to validate its contents (as you should for all files you download from the internet).

We are now ready to launch our ElasticSearch, Kibana, and webhook server applications. To do so, navigate to the docker directory and run the following command: docker-compose up -d

This will start your application stack, which can take a few minutes or more to download the necessary Docker images and build the webhook server application image. Once all of your applications show as Healthy, Docker-Compose will exit, allowing you to interact with your terminal again. You terminal should look something like this when everything is ready:

While we wait for ElasticSearch and Kibana to finish their configuration, lets go ahead and configure our webhook on GitHub.

Setting up webhooks on GitHub

For this exercise, you will need a GitHub repository, or a GitHub organization and to have already created an ngrok account and installed the ngrok CLI, which is a tool that allows us to receive the GitHub webhooks without having to expose our local computer to the public internet.

To simplify this process, ngrok now allows all users, even on their free tier, to create one static domain, follow the instructions here to create your free domain after creating your ngrok account.

Once you’ve installed this tool, its time to configure our webhooks. First we need to start ngrok on our local system. Open your CLI and run the following command: ngrok http --domain <you_domain_here> 3000. For example, if your domain was https://cool-dev.ngrok.dev, your command would look something like this: ngrok http --domain https://cool-dev.ngrok.dev 3000.

Once you run this command our local ngrok client will connect to the ngrok service, any event sent to https://cool-dev.ngrok.dev will automatically be forwarded to our local system by the ngrok service, and then our local ngrok client will forward it to port 3000 on our local system, which is where our webhook server is running. All of this allows us to capture webhook events without the need for us to directly expose our ElasticSearch, Kibana, or webhook server applications to the public internet.

If you are configuring a repository for this exercise, navigate to your repository and select Settings -> Webhooks at the top of the page. If using an organization do the same, but from your organization page.

You should see a page that looks like this:

On this page you can now fill out the fields:

For Payload URL use your ngrok domain followed by /github/webhooks . If your ngrok domain is https://cool-dev.ngrok.dev your value for this field would be https://coold-dev.ngrok.dev/github/webhooks. You may be asking, why /github/webhooks, well, if you go back and inspect the index.js file mentioned earlier in this tutorial, you will see we used express.js to create a webserver, and the route is /github/webhooks.
For the Content type field, select application/json. Another great feature of ElasticSearch is that it inherently understands JSON, we don’t need to do any additional processing, at least for this simple tutorial, if we use this type.
For the Secret field, enter the random WEBHOOK_SECRET value you generated earlier and stored in your .env filed. This will be used by GitHub to generate the signature of our webhook payloads.
For the Which events... field, the choice for this is really up to you to decide which events GitHub will send to your server. For today you should select Let me select individual events which will cause a few dozen checkboxes to appear for the individual event types. Select the Workflow Jobs event checkbox, which will allow us to receive events when GitHub Actions Workflow Jobs are triggered and when they complete.

Now click, Add Webhook. Once you’ve saved your webhook, you can navigate to it and click the Recent Deliveries tab, which should look like this with an event called ping, which is used by GitHub to validate the server is online and accepting requests.

If the webhook is successful, and you have a green icon, you are good to go, if it is red, click the webhook delivery and inspect the error returned by the webhook server to diagnose the problem. If it is red, it is possible your ElasticSearch instance just hasn’t finished coming online yet, check if Docker-Compose has finished launching your cluster.

Generating webhook data

If we are going to create dashboards in Kibana, we need data to create dashboards from. In the previous step we configured the webhook to send us Workflow Job events, so we can trigger an existing GitHub Actions workflow in our repository or our organization, or you can create a repository, copy the .github/workflows/workflow.yml file from this project into your .github/workflows directory, and commit the file to your default branch. The act of committing the file will cause the workflow to kick off, since we have both the workflow_dispatch and push events as triggers in our workflow.

This should have generated webhooks which have been forwarded to our webhook server via ngrok, then added to our ElasticSearch instance. You can validate this by checking the logs for the webhook server using the following command: docker logs insights-webhook-server-1

If the insertions were successful, you will see logs like the following, where the webhook server created the initial set of ElasticSearch indices:

Creating an Actions dashboard

Now that we have data in our ElasticSearch instance, we can start creating dashboards. To access your Kibana instance, open your browser and navigate to http://localhost:5601. You should see the Kibana login page:

You are now on the Kibana homepage and are ready to create our first dashboard. To create a dashboard we need access to our data. In ElasticSearch, we provide access to our data by creating a Data View, which is an object that points to one or more ElasticSearch indices and allows us to access the data in those indices. To create a Data View, click the menu button in the top-left corner, scroll to the bottom, and select Stack Management. On the Stack Management page, scroll down in the left-hand pane and select Data Views. You will see a few default views that ship with ElasticSearch out of the box, but we want to create our own. Select Create data view in the top-right corner and set the following values:

For the Name field, you can set this to whatever descriptive name you like, but we will go with Workflow Jobs for our purposed today.
For the Index pattern field, type workflow_job being sure to remove the * added by default in the UI.
For the Timestamp field field, select workflow_job.created_at, this is important, as it will be used to create time-series metrics on our dashboard.

Now click Save data view to Kibana. Once you’ve saved your data view, we are ready to create our first dashboard. Click the menu in the top-left hand corner and select Dashboard, you will see a sample dashboard already created, you can ignore this and click Create dashboard in the top-right corner. Once on the dashboard, click Create visualization in the center of the screen.

We are now on the Create visualiationpage which should look something like this:

For this part, we will create a metric that displays the success and failure count for all of our Actions workflows over time. This is an important metric that lets us visualize how often all of our Actions are succeeding or failing, which might signal to us that there is something wrong with our workflow execution, or that new issues have popped up as review the metric over time.

To create this metric, select our Workflow Jobs data view from the top-left hand dropdown. Then change the Bar vertical stacked dropdown at the top of the page to Line.

We now need to select the data fields we want to appear on the graph. First click the + sign next to the Records field in the left-hand pane:

This will add all of Workflow Job events to our dataset since we want to visualize the success or failure of all of our workflow jobs. When you add the Records field, it will automatically add the workflow_job.created_at field as well, as this is our Timestamp field and our line graph represents a time-series, this will allow us to display the number of success or failures of our workflow jobs over time.

Next, we need to add the conclusion field to our workspace, this represents the state the workflow finished in, i.e., success or failure. In the search bar in the top-left corner search for workflow_job.conclusion.keyword and add it to the workspace.

At this point we should have three fields in the workspace under the Selected fieldsheading, Records, workflow_job.create_at, and workflow_job.conclusion.keyword:

You may notice at this time that your line graph has been populated with data, but its difficult to read as only data dots are currently displayed. Let’s connect our dots. At the bottom of the page, under the Suggestions section, select the second graph type which has connected lines:

Now you should be able to more easily see your data in the graph:

Finally, lets update the the title of our axes. On the right-hand side, select workflow_job.created at under the Horizontal axis heading. In the dialog that appears at the bottom, under the Appearance section, let’s change the Name field to Event. This will cause the x-axis to display Events per minute. Close the dialog box.

Next lets update the y-axis title. Select Count of records under the Vertical axis heading. In the dialog that appears at the bottom, under the Appearance section, let’s change the Name field to Total. This will cause the y-axis to display Total. Close the dialog box.

You should now have an updated visualization with updated titles on your axes:

You can now save the visualization by selecting Save and return in the top-right corner. You will see your newly created visualization in your dashboard:

You can also update the title of your metric by clicking the gear icon in the top-right corner of the metric and selecting Panel settings. You can set the Title field in the dialog to anything you chose, but let’s go with Workflow Jobs by Status.

You can now save your dashboard by selecting Save in the top-right corner. A dialog box will appear prompting you to give your dashboard a Name and Description. Once you’ve filled out the fields, select Save.

You now have a fully-functioning dashboard that displays historical data for the success and failure rates of your workflow jobs. You can choose the timeframe for which data is displayed, and even have it automatically refresh in realtime by selecting the calendar dropdown in the top-right corner:

Update the time interval and refresh period

Conclusion

While this is a rather simplistic approach to getting started with GitHub webhooks and metrics, it provides you with a starting point to understand how webhooks work and how to shape GitHub data to create dashboards. From here the sky is the limit on the data you can extract from your repos, organizations, and enterprises. Over time your dashboards will populate with 1,000’s or even millions of data points providing you rich access to your data like so:

Follow me for my next entry in this series as we work towards building enterprise-grade insights dashboards, where I will walk you through how to deploy a more production-grade ElasticSearch and Kibana cluster in Kubernetes using the ElasticSearch Operator for Kubernetes to manage the lifecycle of your cluster.