Dev IRL: How to ingest Heroku Log Drains with Nodejs on AWS CloudWatch? Part 6: The alert system
The last part of our architecture is to create a notification mechanism to alert us in case of outage. There can be a lot of implementations as you can imagine, but I’ll go for a simple one.
This story is the last of a 6 parts series. You can find all the other parts below:
- Part 1: Architecture
- Part 2: Get the logs
- Part 3: Handle the drains
- Part 4: Sending events with SQS
- Part 5: Ingest the logs
- Part 6: The alert system <<< 📍You are here!
- Bonus Part: SpeedRun
The alert configuration
Remember how we can import custom modules, like the error-codes.js
one we created in part 3: Handle the drains? Great! We’ll create a similar one, that will act as an alert pattern definition. Go to your heroku-drains Lambda function and create a new file alarms-config.js
next to the index.mjs
file:
This JSON list all the common Heroku specific codes and errors (codes
and errors
properties), but you can add your own alerts definitions in the regexp
property. Just provide an expression (that could be a RegExp) and a description. For each alert, you can also provide an exclude_apps
array (the rule will be tested against all apps BUT those in this array) or an include_apps
array (the rule will be tested ONLY against the apps in this array).
A specific error is handled by the exclude_htr_paths
property. By default, if a request last longer than 5s, the alert system raises a High Response time alert. If you don’t want specific routes to raise this alert, simply provide their paths as a RegExp and they will be discarded during the alert detection phase.
All in all, this way of handling the alert is:
- rather simple: you just have to update the JSON, deploy the update and the new configuration will be live
- flexible: you can customize the alerts for each app
- works out-of-the-box for new apps: since Heroku specific alerts are already there, you don’t have to do anything when a new app is added
Of course, its up to you to manage this configuration the way you want (with a UI for instance), but that’ll do for now.
The alert implementation
As you can imagine, we’ll have to update our heroku-drains Lambda function to detect the errors. I’ll show you the final code of this function and then we’ll see the alert system’s specific updates:
The first update is the import of the alarms-config.js
module (line 3), instead of the dummy error-codes.js
. This way, we have stored the global config in the function’s memory. Next, we’ve declared an appAlarms
object (line 14), that will store in-memory the specific alarm configuration for each app (since the config is customizable per app).
Then, 3 helper functions have been added just before the main event handler:
getAppAlarms
: stores the specific app configuration in theappAlarms
object.getAlarm
: finds if an alarm is raised by the current messageinvokeAlarms
: invokes the Lambda function dedicated to sending the alarm notifications
Finally, the part that handle the POST
method have been updated to use those new helpers:
- The app’s specific alarm config is stored in a
appAlarms
const with thegetAppAlarms
helper (line 355). - An
alarms
array is declared to store the potential alarms (line 362) - For each message, the
getAlarm
helper is called to store (or not) a new alarm in thealarms
array (line 418). - If alarms need to be sent, the
invokeAlarms
helper is called (line 444).
A new Lambda function to send notifications
The last thing we have to do in order to test our alarm system is to create the Lambda function dedicated to send the notification, i.e. the function whose name is provided in the FunctionName
property of the invokeAlarms
helper. ⚠️ This name is stored in a new environment variable named DRAINS_ALERTS_FUNCTION
. So the first thing you have to do is going to your heroku-drains function’s Configuration > Environment variables tab and put the name you want in a new DRAINS_ALERTS_FUNCTION
variable.
Go to your Lambda dashboard, create a new function with Node.js 16.x as your runtime and arm64 as your architecture, and name it after your DRAINS_ALERTS_FUNCTION
variable’s content. Then rename index.js
to index.mjs
to make it an ES module.
From now on, we’ll reference this function as the heroku-drains-alarms Lambda function (but you can name it the way you want). We’ll write dummy code just to ensure that the invocation is successful:
Wait, did we say “invocation”? Does our heroku-drains Lambda function have the permissions to invoke the heroku-drains-alarms? Of course, not (yet)!
Go to the heroku-drains Lambda function Configuration > Permissions tab and click on its role. Edit the role’s policy and add this new statement:
{
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction"
],
"Resource": [
"arn:aws:lambda:eu-west-1:XXXXXXXXX:function:heroku-drains-alarms-test"
]
}
Again, beware your resource ARN: I obfuscated my AWS’s ID, and your function’s name could not be the same as mine.
As the heroku-drains-alarms function must send SNS notifications, you’ll need to create a new SNS topic to route your emails:
- Go to your SNS dashboard
- Create a new Topic
- Choose Standard
- Give a name to your Topic
- Click on Create subscription
- Select Email as Protocol and add your email (typically an alias to your dev team)
- Confirm the subscription through the email confirmation you’ve received
As usual, you must give your heroku-drains-alarms function the proper permissions to send notifications through your new SNS topic, so add the following statement to its policy (pay attention to the resource’s ARN):
{
"Effect": "Allow",
"Action": [
"sns:Publish"
],
"Resource": [
"arn:aws:sns:eu-west-1:XXXXXXXXX:Logs-test"
]
}
Alright! The setup is complete, we can test it by raising an error from our application. The easiest way to do it is to disable all the Heroku app’s dynos and visit it, as it’ll will raise a No web processing error. If everything went fine, you should see a new log in your heroku-drains-alarms function Log group, with the associated alarm definition. If not, check your heroku-drains function Log group to ensure that no errors occurs.
Don’t spam yourself!
It would be very easy to be spammed by our own alerts if you don’t control them. For instance, if you disable the Heroku dynos of an app with high traffic, you’ll be overflown by thousands of No web processing errors. And you don’t want that.
So we must define 3 parameters:
- the maximum number of times we want to be notified by a specific kind of error
- the time window after which we should reset this number
- the ability to enable/disable all the notifications
Those parameters can be set in the Environment variables of our function. Go to the Configuration > Environment variables tab of your heroku-drains-alarms function and add the following ones:
NOTIFICATIONS_TIMESPAN
: that’ll will define the time window (in seconds), say 3600MAX_NOTIFICATIONS_BY_TIMESPAN
: that’ll define how much notifications of the same type can be sent during the time window, say 10SEND_NOTIFICATION
: whether or not we want to send the notifications (true/false)
The thing is, how do we remember how much notifications has been sent during a time window? We’ve already answered this question in the Part 1: Architecture: with a Redis database! We could also add its connstring in the Environnement variables, as REDISCLOUD_URL for instance.
Connection to Redis
To establish a connection to the Redis database we’ll need the redis
NPM module. Remember how to import a module that comes from NPM? If not, the process has been explained during the heroku-drains Lambda function setup, in Part 2: Get the logs 😉
In this case you just have to create a new folder on your system named after your Lambda function with nodejs and npm installed (be sure to use the same node version as your Lambda function). Then create a nodejs
subfolder and run the following command in it with your terminal:
npm install redis --save
Compress the nodejs
folder as a .zip
file and create a new custom Layer by uploading it.
Once the Layer has been added to the Lambda function, you should be able to import the Redis client with:
import { createClient } from '/opt/nodejs/node_modules/redis/dist/index.js';
The last step is to create a helper function to make the connection to the Redis database. Again, the helper function will be declared outside the main event handler to use the same connection across the function invocations.
Quick tip: It could be a good idea to set a custom CloudWatch alarm that’ll be triggered when the
REDIS ERROR
string is encountered, to be warned if the connection fails.
Sending the notifications
Let’s see how our heroku-drains-alerts Lambda function will wrap it all together:
First, we import and configure the various clients (AWS and Redis) and parse our Environment variables. Then, we declare our sending helper: sendNotificationAlarm
. Nothing too complex here, just pay attention to the TopicArn
property, that must match yours and be store in a new environment variable, named SNS_TOPIC_ARN
. The ARN should have this pattern: arn:aws:sns:[YOUR_REGION]:[YOUR_AWS_ID]:[YOUR_TOPIC_NAME]
.
Finally, we create the Redis connection that will be shared across all the function’s invocations.
Now for the event handler. Actually, the event
object is the alarms
array that we created during the parsing phase of the heroku-drains Lambda function, and we’ll process each item as follows:
- Create a unique ID from the app’s name and the error code to use it as a key in the Redis database.
- If the alarm is a custom RegExp, we’ll create a unique ID based on the description, since there is no error code to retrieve.
- If a Redis client is available (never assume that the connection to a third party is fine 😄), we’ll check if the alarm should trigger or not, based on the current settings.
- If the alarm must be triggered, send it with the
sendNotificationAlarm
helper.
A quick note here regarding the Redis configuration of the entries when we insert them: we take advantage of the TTL of the entry (the EX
property) so we don’t have to check if the time window is over or not. If the entry isn’t here anymore, it means that the time window is expired!
What have we learned?
- How to invoke a Lambda function from an other Lambda
- How to connect to a Redis database with a Lambda function
- How to send an email from a Lambda function
Final thoughts
I hope that you have a better understanding on how to store Heroku logs in AWS CloudWatch along with an alert system! Starting from a simple task at first sight, we have learned a lot and discovered at least 6 AWS services:
- Lambda
- API Gateway
- SNS
- SQS
- IAM
- CloudWatch
All we’ve seen during this series is a mere work base as there could be a lot of improvements, to name a few:
- better handling of SQS event (and a dead letter queue)
- better handling of alert configuration (managed with a UI)
- global monitoring of the system (with AWS X-Ray)
For a recap you can check the Bonus Part: SpeedRun where you’ll find all the final code and configuration of the Lambda function, along with the other AWS resources to create.