Integrate SignalFX with StatusPage.io using AWS Lambda

UPDATE: I’ve released a new blog post on integrating SignalFx with StatusPage.io using SignalFlow (within an AWS Lambda function). I highly recommend that you use the newest method, however feel free to skim through this article as it gives you more context on integrating both products.

Here at Poka we rely on quite a few SaaS applications to support our own service. In many cases we want those services to talk to each other. We do this for a number of reasons:

  • To centralize the information in Slack channels
  • To create an alert in our on-call management solution when a specific event appears in our logs
  • To close a Jira issue when we merge a bug fix branch in Github

…and the list goes on.

Sometimes most of the integration effort is done for us, sometimes you need to add some glue yourself in order to have them work together. Integrating SignalFX with StatusPage.io was more of the latter.

You probably already know what a “status page” is. It’s the web page that your customers visit when they experience an issue (availability, performance, degradation, etc.) with your service. They can check if the problem is acknowledged and find an ETA of when the issue will be resolved.

Using StatusPage.io you can also expose some performance metrics, so if a user experiences long loading time, he/she can visit the status page, check the Mean Backend Latency metric and confirm if the issue comes from Poka or from their office network.

The Mean Backend Latency metric on pokastatus.io

Out-of-the-box StatusPage.io offers the option of integrating system metrics with a few third party data sources.

Available 3rd party data sources

Unfortunately, our provider, SignalFX, isn’t one of them. Being a low-key StatusPage.io client on a “startup” plan, we didn’t consider contacting them for a feature request. But that didn’t stop us, because they do provide an API, so we can send metrics to their service regardless of the source!

There were a few options to choose from when considering which technology to use. The language choice is less of a problem at Poka since almost everything that involves the backend team is done using Python. As for hosting this simple metrics syncing service, using a production server didn’t sound like a good idea. It would have involved too much maintenance effort, deployment scripts, etc. So it was pretty clear from the start that this would be a good fit for a AWS Lambda function that would be called recurrently every couple minutes.

The Lambda function

I’ve embed the Lambda code below. Notice that this runs on Python 3.6.

There’s a couple of interesting things to note.

First, when using the SignalFX API, you won’t get any analytics done on your time series, this means you’ll have to do all the calculations within the Lambda. For exemple, in collect_latency_metrics(…) we calculate the average of data points with the same timestamp. Whereas in collect_server_error_metrics(…), we process the percentage of server errors by dividing the server errors count by the total requests count in a 5 minute window.

Second, we handled the case were your data points need to be backfilled. At some point, maybe you’ll want to adjust your calculation or filter the time series using another load balancer. To achieve this, you need to hit the “Reset” button for this specific metric in StatusPage.io. This is not a huge deal because StatusPage.io API lets you know that you need to “back fill” the data points. When detecting this our Lambda will process the time series from the last 28 days.

The last thing I want to point out is the use of generators. StatusPage.io has a limit of only 3000 data points it can receive per API call. We also limit our SignalFX query to a maximum of 1 day of time series at the time. By using generators we alternate our calls to each service which helps prevent aggressive rate limiting on StatusPage.io. It also helps reduce the memory footprint of the Lambda and therefore reduces the invocation cost.

The CloudFormation template

If you’ve read the previous article on our blog you might have noticed that we’re not big fans of creating AWS resources using the blue buttons. So I’ve embedded below a simple template.

Putting it all together

If you’re not used to uploading Lambda functions along your CloudFormation templates, here’s a primer.

  • Zip your Lambda function.
  • Make sure it’s properly referenced in the template
SignalFxStatusPageIntegrationLambdaFunction:
Type: AWS::Lambda::Function
Properties:
Code: signalfx_statuspage_integration.zip
  • Execute the package command using the AWS CLI
aws cloudformation package --template-file signalfx-statuspage-integration.yml --s3-bucket a-bucket-you-have-access-to --output-template-file ready-to-upload-template.yml
  • Create a new CloudFormation stack with the ready-to-upload-template.yml using either the CLI or the console

At this point you’ll be asked to input some parameters.

Those related to SignalFX are accessible in the organisation tab. The one related to StatusPage.io can be found when creating a new metric:

https://manage.statuspage.io/pages/<your-page-id>/metrics-displays

Finally, be patient, you need to allow some time before the data points show up in your status page. If you worry something has gone wrong, you can check the Lambda function logs in CloudWatch.

The result