Dev IRL: How to ingest Heroku Log Drains with Nodejs on AWS CloudWatch? Part 5: Ingest the logs
A little explanation is needed here: to put logs in a Log stream you must provide a sequenceToken
, i.e. a pointer to the last log so CloudWatch know where to put the next one. You can retrieve this sequenceToken
either in return of the CloudWatch API putLogEvents
method or describeLogStreams
method. If the sequenceToken
you provide doesn’t match with the current one, an InvalidSequenceTokenException
is raised.
This story is the fifth of a 6 parts series. You can find all the other parts below:
- Part 1: Architecture
- Part 2: Get the logs
- Part 3: Handle the drains
- Part 4: Sending events with SQS
- Part 5: Ingest the logs <<< 📍You are here!
- Part 6: The alert system
- Bonus Part: SpeedRun
Not so simple
The issues are:
- in essence, you can’t store anything between Lambda invocations so you can’t retrieve the
sequenceToken
from theputLogEvents
method and use it for the next invocation - even if you store this
sequenceToken
in a Redis database for the next invocation to access it, it could be already “expired” because Lambda functions can be invoked concurrently (and they will) and a new log could be stored in-between invocations of the same Lambda function - you could create a new Log stream for each invocation (thus avoiding the
sequenceToken
usage) but searching for a specific log through all those log streams will be painful
So you must get the current sequenceToken
first and implement a retry mechanism if a InvalidSequenceTokenException
is raised.
The implementation
Go back to your heroku-drains-storage Lambda function’s Code tab. We’ve left it with a simple log of the event’s Records
property, and we must now format the events so they can be stored in a CloudWatch Log stream. Replace the code with the following:
Let’s take a closer look to this. We first declare our AWS helper for CloudWatch, then define a helper function to get infos about the current Log stream (basically the same we had in the heroku-drains Lambda function) in order to get the sequenceToken
.
We also define a helper function to actually put the logs in the Log stream, and do it recursively if it fails in case of an InvalidSequenceTokenException
. Please note the x-amzn-logs-format
header added in the build step of the request. This is needed to convert the logs in JSON format and have a finer-grained search capability in CloudWatch.
Finally, we define our main event handler. The first step is to build a batch of events to store. There is a limit of 10,000 events per batch so depending your needs, you could have to chunk this array. Then we sort the events by their timestamp to store them in the correct order, get the current sequenceToken
and store them.
Check the logs
Alright! Now if you go back to the Log stream of your app and generate some logs, you should see them popping one after the other, neatly parsed in a JSON object! 🎉
What have we learned?
- How the CloudWatch logging system works
- How to store logs in CloudWatch with a Lambda function
- How to store logs as a JSON object
Now let’s see how to be notified if an alert is raised in the last part: The alert system 🚀