Managing OpsGenie Heartbeats using CloudFormation

Simon-Pierre Gingras
poka-techblog
Published in
4 min readOct 6, 2017

--

Most web app monitoring tools like StatusCake or Uptime Robot revolve around the idea that you expose a public endpoint to your website or API. You then configure the monitoring service to ping a URL of your choice at regular intervals. If the ping fails, you are notified.

This approach works well for publicly available websites or APIs. However, it is often inadequate to ensure availability of private processes in your stack. Some components of your application like CRON jobs or worker daemons are typically living behind your firewall and cannot be monitored using this type of public-facing monitoring.

Heartbeats to the rescue

Thankfully, a simple solution exists: heartbeats. The idea of a heartbeat is pretty simple: if the heartbeat stops for a given period of time, you are alerted. For example, a daily CRON job that has not sent a heartbeat in more than 24 hours can indicate abnormal behaviour.

At Poka, we use heartbeats in many places. Here are a few examples:

  • The processes responsible for database backups send heartbeats upon backup completion.
  • Every few minutes, our worker daemons must send a heartbeat to indicate they are still alive and crunching tasks.
  • Every morning, analytics reports aggregate yesterday’s Poka usage. When the reports are generated, a heartbeat is sent.
  • We ensure our application metrics are sent at regular intervals to our status page by using heartbeats.

The heartbeat implementation we use is provided by OpsGenie. Their heartbeat feature is easy to use, but most importantly, reliable. Unlike many other monitoring tools, we never had any false positives, which are pretty annoying. Getting drawn out of bed at 3 AM because of a false alert gets old quickly.

So yes, heartbeats are miraculous. Let’s use them everywhere.

Typical DevOps engineer carelessly applying heartbeats everywhere

But before long, we’ll end up with tens of heartbeats in OpsGenie. Having to manage them manually will quickly get out of hand (pun intended). Since we are enormous fans of CloudFormation here at Poka, why not use it to manage heartbeat provisioning? Let’s see how it goes.

Overview

What we want to achieve is to be able to declare heartbeats within our CloudFormation templates in a succinct and easy fashion. If we remove the heartbeat resource from our CF template, then the heartbeat should be deleted from OpsGenie. Finally, if we modify any properties of the CF resource, then the heartbeat should be modified accordingly in OpsGenie.

Since OpsGenie heartbeats are not natively supported by CloudFormation, we need to roll up our sleeves and create our own tool. The solution to this problem relies on CloudFormation’s Custom Resources.

First, we’ll need a Lambda function that will perform create/update/delete operations on OpsGenie heartbeats. Then, whenever we want to manage a new heartbeat, we’ll create a new Custom Resource that uses this Lambda function.

Below you will find the code that implements this idea. If you’d like, you can follow along by checking out the source code.

The CloudFormation template

In this template, HeartbeatFunction is the Lambda function that is responsible for creating, updating and deleting OpsGenie heartbeats. This function requires that you supply your Heartbeat API Key (see the OpsGenieHeartbeatApiKey parameter).

We also create 2 heartbeats (Heartbeat1, Heartbeat2) for demonstration purposes. These heartbeats use Custom Resources that are linked to our Lambda function. Notice how we can define the name, description interval and state (enabled or disabled) for each individual heartbeat. Changing the value for those properties will be reflected in our heartbeats in OpsGenie.

The Lambda function

In this code, you’ll find the 3 functions (create_heartbeat, update_heartbeat, delete_heartbeat) that are the building blocks that we use to manage a heartbeat. We use OpsGenie’s Heartbeat API to perform all of these operations.

For example, if we want to create a heartbeat, we’ll need to send a POST request on the /heartbeat endpoint of the API:

def create_heartbeat(name, api_key, interval=None, interval_unit=None, description=None, enabled=True):
if _heartbeat_exists(name=name, api_key=api_key):
raise HeartbeatAlreadyExistsError(name=name)

response = requests.post(
'https://api.opsgenie.com/v1/json/heartbeat',
json=dict(
apiKey=api_key,
name=name,
interval=interval,
intervalUnit=interval_unit,
description=description,
enabled=enabled,
))
try:
response.raise_for_status()
except:
print(response.content)
raise

Also, we’ll need to verify that the heartbeat is not existing before we try to create it, by trying to retrieve the heartbeat by name:

def _heartbeat_exists(name, api_key):
response = requests.get(
'https://api.opsgenie.com/v1/json/heartbeat',
params=dict(
name=name,
apiKey=api_key,
)
)
try:
response.raise_for_status()
except requests.exceptions.HTTPError as e :
if e.response.status_code == 400 and \
f'Heartbeat with name [{name}] does not exist' in response.json()['error']:
return False
else:
raise
else:
return True

The Result

If you provision our stack in your AWS account, you will notice that the 2 heartbeats have been created successfully in OpsGenie:

Yes, these heartbeats are now managed by CloudFormation 👍 🎉

You can see that the second heartbeat is disabled, as per our configuration in the CF template.

Wrap up

Now that we have seen what heartbeats are and how they are useful, you can reuse this recipe to manage your OpsGenie heartbeats using Infrastructure as Code principles. Now that your heartbeats are safely declared in your codebase, go ahead and heartbeat everything!

Special thanks to Julie Dorion-Bélanger for the nice graphics!

--

--