Understand Lambda Logging and Unlock CloudWatch Logs

Zac Charles

I think it’s fair to say that everyone who uses Lambda is aware of CloudWatch. It’s well known that a call to your language’s version of console.log, print, or Console.WriteLine will result in text appearing in CloudWatch logs. This is usually where a lot of people’s understanding ends, though.

Don’t get me wrong, this is usually enough to successfully debug a Lambda function. However, I think a deeper understanding can help reason about your function’s logs more easily and unlock new possibilities.

In this post, I’ll explain how Groups, Streams, and Events fit together, describe some relevant parts of the CloudWatch API and how Lambda interacts with it, then explain how to make use of your logs after they’re sent to CloudWatch.

Logging, get it? (Photo by Ales Krivec on Unsplash)

Log Groups, Log Streams, and Log Events

Let’s take a step back and look at how CloudWatch organises logs. The lowest units in CloudWatch are events. Each event represents a single log record and is made up of a timestamp and a message.

Events are uploaded to CloudWatch in batches of up to 10,000 chronological ordered events using PutLogEvents.

In the screenshot below, each row displays an event. The message can contain multiple lines of text which you can reveal using the arrow on the left.

Example log events.

Log events always belong to a single log stream. A log stream is simply a named sequence of log events. That name can be anything, but Lambda has its own naming convention that we’ll take a look at. Log streams are created using the CreateLogStream API action.

Log streams have something called a sequence token which is an opaque string (meaning you shouldn’t care what the characters mean). A new sequence token is returned each time you call PutLogEvents. Each PutLogEvents call must include the sequence token returned by the previous call, except the very first call on a newly created stream.

The sequence token helps keep log entries in order. It also makes it practically impossible for a log stream to have more than one writer at a time. That’s because even though DescribeLogStreams returns the current sequence token, it may be invalid by the time you call PutLogEvents due to someone else making the call before you.

Example log streams.

Lastly, a log group is a named collection of log streams. Lambda also has its own naming convention for groups. It’s here at the log group level that you set the number of days to retain log events and can configure encryption.

Log groups are created using the CreateLogGroup action. They can have subscription filters and metric filters associated with them, but we’ll talk about those near the end of this post.

Putting it all together, we have something like this.

Lambda Logs

Now that we know more about groups, streams, and events, let’s take a look at how Lambda uses them.

If you want your function to log to CloudWatch, its IAM policy must allow the logs:CreateLogGroup, logs:CreateLogStream, and logs:PutLogEvents actions. You’ve probably seen a policy like this before:

Example minimal IAM policy for Lambda functions.

Lambda creates a new log stream per container. That makes sense when you think about the way sequence tokens work. With this one-to-one mapping, each container simply keeps track of its own sequence token.

Lambda’s log stream naming convention is date/[version]uuid.
Therefore, 2019/07/12/[$LATEST]312c2d81e2e64af58dbe557754f9aa13 will contain logs from a container that started on 2019/07/12 and run the latest version of the function. If you invoked version 1 instead, the name would have been 2019/07/12/[1]312c2d81e2e64af58dbe557754f9aa13.

Log streams must belong to a log group, so Lambda creates a new log group per function just before creating a log stream, if it doesn’t already exist.

The group naming convention is /aws/lambda/ followed by your function’s name. So Lambda would create /aws/lambda/zac for the zac function.

Log events come from your function’s standard output and standard error streams. These two streams are automatically redirected to another process that is responsible for asynchronously pushing batches of log events to CloudWatch.

Lambda calls CloudWatch APIs just like you would.

Unlocking CloudWatch Logs

Once your functions output is pushed to CloudWatch, there is actually quite a lot more you can do.

Searching

With parallel containers logging to multiple streams, it can be hard to find the logs you’re looking for, such as those of a particular customer or order. That’s especially true if you’ve never noticed the Search Log Group button (I don’t blame you if you haven’t).

You can search across log streams, who knew?

This button is above the list of a log group’s streams. It takes you to a page that shows all of the events in all of the streams in the group. From there, you can use the Filter events text box to search the entire log group and click Show in stream to see surrounding events.

Insights

CloudWatch Logs Insights enables you to search and analyse your log data using a query language that can aggregate, filter, and project log events across the entire log group. You can get the results as a table or graph.

One of the coolest things about Insights is that it understands TSV and JSON log events, letting you do even more advanced queries. I find Insights really useful when I don’t want to publish a custom metric or forgot to.

Metric Filters

Metric filters define terms and patterns to look for in log data as it is sent to CloudWatch. CloudWatch uses metric filters to turn log data into numerical metrics that you can graph or set alarms on.

One of the most common uses of metric filters with Lambda is extracting a metric for consumed memory from the REPORT log entries.

Subscription Filters

You can use subscription filters to get a real-time feed of log events delivered to other services such as a Kinesis Data Stream, Kinesis Firehose, or a Lambda function for custom processing, analysis, or loading to other systems.

Each log group can only have one subscription, but by using a Kinesis Data Stream you can have multiple consumers (especially with enhanced fan-out).

It’s common to use a subscription to copy logs from CloudWatch into a more powerful system such as the Elastic Stack.

I may write more about using subscriptions in a future post, so let me know if that sounds interesting.

Batch Export

In some cases, real-time streaming isn’t necessary and a big dump of log data is just what the doctor ordered. In that case, CloudWatch can export to S3.

This could be used for things such as archiving, batch analysis, or importing to other systems. Don’t forget to save costs by setting an S3 lifecycle policy.

Log data can take up to 12 hours to become available for export. If you need files in S3 sooner, perhaps consider Kinesis Firehose to S3 (see above).

Conclusion

CloudWatch is a large and often neglected service. It’s given just enough attention to survive then pushed aside for the more shiny services.

However, I think there’s real value is understanding it a bit more, so I hope you learned something useful from this post, even if it’s just the location of the log group search button.

Please let me know if you have any questions or would like to hear more about a particular topic.


For more like this, please follow me on Medium and Twitter.

Zac Charles

Written by

Senior Engineer at Just Eat

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade