Dev IRL: How to ingest Heroku Log Drains with Nodejs on AWS CloudWatch? Part 6: The alert system

Simon Briche
9 min readOct 24, 2022

--

Photo by Lucas George Wendt on Unsplash

The last part of our architecture is to create a notification mechanism to alert us in case of outage. There can be a lot of implementations as you can imagine, but I’ll go for a simple one.

This story is the last of a 6 parts series. You can find all the other parts below:

The alert configuration

Remember how we can import custom modules, like the error-codes.js one we created in part 3: Handle the drains? Great! We’ll create a similar one, that will act as an alert pattern definition. Go to your heroku-drains Lambda function and create a new file alarms-config.js next to the index.mjs file:

This JSON list all the common Heroku specific codes and errors (codes and errors properties), but you can add your own alerts definitions in the regexp property. Just provide an expression (that could be a RegExp) and a description. For each alert, you can also provide an exclude_apps array (the rule will be tested against all apps BUT those in this array) or an include_apps array (the rule will be tested ONLY against the apps in this array).

A specific error is handled by the exclude_htr_paths property. By default, if a request last longer than 5s, the alert system raises a High Response time alert. If you don’t want specific routes to raise this alert, simply provide their paths as a RegExp and they will be discarded during the alert detection phase.

All in all, this way of handling the alert is:

  • rather simple: you just have to update the JSON, deploy the update and the new configuration will be live
  • flexible: you can customize the alerts for each app
  • works out-of-the-box for new apps: since Heroku specific alerts are already there, you don’t have to do anything when a new app is added

Of course, its up to you to manage this configuration the way you want (with a UI for instance), but that’ll do for now.

The alert implementation

As you can imagine, we’ll have to update our heroku-drains Lambda function to detect the errors. I’ll show you the final code of this function and then we’ll see the alert system’s specific updates:

The first update is the import of the alarms-config.js module (line 3), instead of the dummy error-codes.js. This way, we have stored the global config in the function’s memory. Next, we’ve declared an appAlarms object (line 14), that will store in-memory the specific alarm configuration for each app (since the config is customizable per app).

Then, 3 helper functions have been added just before the main event handler:

  • getAppAlarms: stores the specific app configuration in the appAlarms object.
  • getAlarm: finds if an alarm is raised by the current message
  • invokeAlarms: invokes the Lambda function dedicated to sending the alarm notifications

Finally, the part that handle the POST method have been updated to use those new helpers:

  • The app’s specific alarm config is stored in a appAlarms const with the getAppAlarms helper (line 355).
  • An alarms array is declared to store the potential alarms (line 362)
  • For each message, the getAlarm helper is called to store (or not) a new alarm in the alarms array (line 418).
  • If alarms need to be sent, the invokeAlarms helper is called (line 444).

A new Lambda function to send notifications

The last thing we have to do in order to test our alarm system is to create the Lambda function dedicated to send the notification, i.e. the function whose name is provided in the FunctionName property of the invokeAlarms helper. ⚠️ This name is stored in a new environment variable named DRAINS_ALERTS_FUNCTION . So the first thing you have to do is going to your heroku-drains function’s Configuration > Environment variables tab and put the name you want in a new DRAINS_ALERTS_FUNCTION variable.

Go to your Lambda dashboard, create a new function with Node.js 16.x as your runtime and arm64 as your architecture, and name it after your DRAINS_ALERTS_FUNCTION variable’s content. Then rename index.js to index.mjs to make it an ES module.

From now on, we’ll reference this function as the heroku-drains-alarms Lambda function (but you can name it the way you want). We’ll write dummy code just to ensure that the invocation is successful:

Wait, did we say “invocation”? Does our heroku-drains Lambda function have the permissions to invoke the heroku-drains-alarms? Of course, not (yet)!

Go to the heroku-drains Lambda function Configuration > Permissions tab and click on its role. Edit the role’s policy and add this new statement:

{
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction"
],
"Resource": [
"arn:aws:lambda:eu-west-1:XXXXXXXXX:function:heroku-drains-alarms-test"
]
}

Again, beware your resource ARN: I obfuscated my AWS’s ID, and your function’s name could not be the same as mine.

As the heroku-drains-alarms function must send SNS notifications, you’ll need to create a new SNS topic to route your emails:

  • Go to your SNS dashboard
  • Create a new Topic
  • Choose Standard
  • Give a name to your Topic
  • Click on Create subscription
  • Select Email as Protocol and add your email (typically an alias to your dev team)
  • Confirm the subscription through the email confirmation you’ve received
Create and configure your SNS topic

As usual, you must give your heroku-drains-alarms function the proper permissions to send notifications through your new SNS topic, so add the following statement to its policy (pay attention to the resource’s ARN):

{
"Effect": "Allow",
"Action": [
"sns:Publish"
],
"Resource": [
"arn:aws:sns:eu-west-1:XXXXXXXXX:Logs-test"
]
}

Alright! The setup is complete, we can test it by raising an error from our application. The easiest way to do it is to disable all the Heroku app’s dynos and visit it, as it’ll will raise a No web processing error. If everything went fine, you should see a new log in your heroku-drains-alarms function Log group, with the associated alarm definition. If not, check your heroku-drains function Log group to ensure that no errors occurs.

Don’t spam yourself!

It would be very easy to be spammed by our own alerts if you don’t control them. For instance, if you disable the Heroku dynos of an app with high traffic, you’ll be overflown by thousands of No web processing errors. And you don’t want that.

So we must define 3 parameters:

  • the maximum number of times we want to be notified by a specific kind of error
  • the time window after which we should reset this number
  • the ability to enable/disable all the notifications

Those parameters can be set in the Environment variables of our function. Go to the Configuration > Environment variables tab of your heroku-drains-alarms function and add the following ones:

  • NOTIFICATIONS_TIMESPAN: that’ll will define the time window (in seconds), say 3600
  • MAX_NOTIFICATIONS_BY_TIMESPAN: that’ll define how much notifications of the same type can be sent during the time window, say 10
  • SEND_NOTIFICATION: whether or not we want to send the notifications (true/false)

The thing is, how do we remember how much notifications has been sent during a time window? We’ve already answered this question in the Part 1: Architecture: with a Redis database! We could also add its connstring in the Environnement variables, as REDISCLOUD_URL for instance.

Environment variables of the heroku-drains-alarms function

Connection to Redis

To establish a connection to the Redis database we’ll need the redis NPM module. Remember how to import a module that comes from NPM? If not, the process has been explained during the heroku-drains Lambda function setup, in Part 2: Get the logs 😉

In this case you just have to create a new folder on your system named after your Lambda function with nodejs and npm installed (be sure to use the same node version as your Lambda function). Then create a nodejs subfolder and run the following command in it with your terminal:

npm install redis --save

Compress the nodejs folder as a .zip file and create a new custom Layer by uploading it.

Once the Layer has been added to the Lambda function, you should be able to import the Redis client with:

import { createClient } from '/opt/nodejs/node_modules/redis/dist/index.js';

The last step is to create a helper function to make the connection to the Redis database. Again, the helper function will be declared outside the main event handler to use the same connection across the function invocations.

Quick tip: It could be a good idea to set a custom CloudWatch alarm that’ll be triggered when the REDIS ERROR string is encountered, to be warned if the connection fails.

Sending the notifications

Let’s see how our heroku-drains-alerts Lambda function will wrap it all together:

First, we import and configure the various clients (AWS and Redis) and parse our Environment variables. Then, we declare our sending helper: sendNotificationAlarm. Nothing too complex here, just pay attention to the TopicArnproperty, that must match yours and be store in a new environment variable, named SNS_TOPIC_ARN. The ARN should have this pattern: arn:aws:sns:[YOUR_REGION]:[YOUR_AWS_ID]:[YOUR_TOPIC_NAME] .

Finally, we create the Redis connection that will be shared across all the function’s invocations.

Now for the event handler. Actually, the event object is the alarms array that we created during the parsing phase of the heroku-drains Lambda function, and we’ll process each item as follows:

  • Create a unique ID from the app’s name and the error code to use it as a key in the Redis database.
  • If the alarm is a custom RegExp, we’ll create a unique ID based on the description, since there is no error code to retrieve.
  • If a Redis client is available (never assume that the connection to a third party is fine 😄), we’ll check if the alarm should trigger or not, based on the current settings.
  • If the alarm must be triggered, send it with the sendNotificationAlarm helper.

A quick note here regarding the Redis configuration of the entries when we insert them: we take advantage of the TTL of the entry (the EX property) so we don’t have to check if the time window is over or not. If the entry isn’t here anymore, it means that the time window is expired!

What have we learned?

  • How to invoke a Lambda function from an other Lambda
  • How to connect to a Redis database with a Lambda function
  • How to send an email from a Lambda function

Final thoughts

I hope that you have a better understanding on how to store Heroku logs in AWS CloudWatch along with an alert system! Starting from a simple task at first sight, we have learned a lot and discovered at least 6 AWS services:

  • Lambda
  • API Gateway
  • SNS
  • SQS
  • IAM
  • CloudWatch

All we’ve seen during this series is a mere work base as there could be a lot of improvements, to name a few:

  • better handling of SQS event (and a dead letter queue)
  • better handling of alert configuration (managed with a UI)
  • global monitoring of the system (with AWS X-Ray)

For a recap you can check the Bonus Part: SpeedRun where you’ll find all the final code and configuration of the Lambda function, along with the other AWS resources to create.

--

--

Simon Briche
0 Followers

Tech enthusiast during the day, gamer at night, much more in between. CTO in french agency. https://simonbriche.dev