Photograph of aviation firefighters in training
When it comes to putting out fires in production, time is of the essence (Source: NASA)

Real-time Notifications via PagerDuty

Christoph Kassen
Glasnostic

--

We built Glasnostic to provide engineers with holistic visibility of their services and a powerful, new way to control how they interact at runtime and automatically. However, no solution is an island: There is an ecosystem of tools, and it is essential that everything works well with each other. This post shows how our webhook facility can be used to integrate with an incident response platform such as PagerDuty.

When production issues occur, time is of the essence. By integrating with an incident response platform, your team will get notified as soon as the issue is detected, enabling you to examine the situation and take action before it gets out of hand.

In this post, we show you how to:

  1. Create a Glasnostic View for which to define an event,
  2. Creating a webhook handler in AWS Lambda that passes Glasnostic webhook data to the PagerDuty API,
  3. Define a Glasnostic Webhook, and
  4. Click from PagerDuty to Glasnostic to take action.

1. Creating an event view

For simplicity, we’ll use Google’s Online Boutique application from our Quick Start as an example and let the event be triggered by a threshold on requests between that application’s frontend and recommendationservice.

To create this threshold, we first define a View for the flow from frontend instances to recommendationservice instances by clicking Create View and entering frontend* in the source column and recommendationservice* in the destination column. (See Creating a View for details on using the view definition interface.)

Looking at the Metrics tab, we see a request rate (“R”) of about 32 r/s. Since we want to make sure that our webhook will be called once it is set up, we’ll use a threshold of 30 r/s later on.

Screenshot of the Glasnostic Console showing the ‘Recommendation-Frontend’ view
“Recommendation-Frontend” view capturing the interactions between frontend and recommendationservice instances.

2. Creating a webhook handler

Setting up PagerDuty

PagerDuty has a well-documented and straightforward API for receiving events, so all we need is an ability to issue REST calls in response to a JSON webhook payload.

To get started, sign up for a PagerDuty account, and create an environment and a Recommendationservice service. Then, set up an Events API V2 integration for the service. Retrieve the integration key and store it securely-we’ll need it later to configure the Lambda endpoint.

Deploying the handler

Next, it’s time to set up a webhook handler that can receive a Glasnostic webhook, transform its event structure into a PagerDuty event, and call the PagerDuty API. We’ll use our sample handler for this purpose and deploy it to AWS Lambda.

In your copy of the sample handler, rename env.yml.example to env.yml and add your Glasnostic Environment and View IDs as well as your PagerDuty Integration Key to the configuration.

Deploy the function to your AWS account via sls deploy and copy the invocation URL displayed during deployment.

3. Defining the webhook

In the Glasnostic Console, navigate to your Quick Start environment Settings, click the Webhooks tab and create a new webhook.

Paste the Lambda invocation URL into the “URL” field, choose the View we created in step 1 and choose “Threshold” as the Event Type. Enter “30” into the value box and click “Save.” The webhook is now active.

Glasnostic event structure with webhook definition and even data.

4. Taking action

The event in PagerDuty

As soon as the threshold in our View is exceeded, you should now see an incident for our service in PagerDuty, complete with contextual information from the event structure.

Screenshot of a Glasnostic warning in PagerDuty
Glasnostic incident data in PagerDuty.

Notice the Recommendation-Frontend link under “LINKS.” This is a deep link into the Glasnostic Console that you can use to examine the context of the warning and to take action.

Click on it to open the Glasnostic Console onto the Recommendation-Frontend* *View. Here you can review the context of the behavior that triggered the event and take action, reactively or proactively.

For instance, you could apply request backpressure to protect recommendationservice instances, as shown below, or you might decide that the concurrency (“C”) spikes are of more concern and shed load by applying a policy there instead, or combine policies across dimensions.

Screenshot of Glasnostic view with policy applied
Event view with a remedial policy applied.

Summary

A tightly integrated ecosystem of tools is essential for modern reliability engineers. That means that individual solutions must be able to notify each other about critical events and other data. Of course, each solution will have its own data structures, so integration often requires a transformation step.

In this post, we showed how a simple webhook handler running on AWS Lambda can be used to translate between events and thus integrate Glasnostic with a wide variety of adjacent solutions in a flexible and scalable way. As a result, other solutions such as PagerDuty can benefit from meaningful alerts and having a direct conduit to runtime control.

Let us know what other integrations you’d like to see!

Originally published at https://glasnostic.com on April 5, 2022.

--

--