How to implement log aggregation for AWS Lambda

Ship the logs for your Lambda functions to a log aggregation service such as Logz.io

Dur­ing the exe­cu­tion of a Lamb­da func­tion, what­ev­er you write to std­out (for example, using console.log in Node.js) will be cap­tured by Lamb­da and sent to Cloud­Watch Logs asyn­chro­nous­ly in the back­ground. And it does this with­out adding any over­head to your func­tion exe­cu­tion time.

You can find all the logs for your Lamb­da func­tions in Cloud­Watch Logs. There is a unique log group for each func­tion. Each log group then consists of many log streams, one for each concurrently executing instance of the function.

You can send logs to Cloud­Watch Logs your­self via the Put­Lo­gEvents oper­a­tion. Or you can send them to your pre­ferred log aggre­ga­tion ser­vice such as Splunk or Elas­tic­search.

But, remem­ber that every­thing has to be done dur­ing a function’s invocation. If you make addi­tion­al net­work calls dur­ing the invo­ca­tion, then you’ll pay for that addi­tion­al exe­cu­tion time. Your users would also have to wait longer for the API to respond.

These extra network calls might only add 10–20ms per invocation. But you have microservices, and a single user action can involve several API calls. Those 10–20ms per API call can compound and add over 100ms to your user-facing latency, which is enough to reduce sales by 1% according to Amazon.

So, don’t do that!

Instead, process the logs from Cloud­Watch Logs after the fact.

In the Cloud­Watch Logs con­sole, you can select a log group and choose to stream the data direct­ly to Amazon’s host­ed Elas­tic­search ser­vice.

This is very use­ful if you’re using the host­ed Elas­tic­search ser­vice already. But if you’re still eval­u­at­ing your options, then give this post a read before you decide on the AWS-host­ed Elas­tic­search.

You can also stream the logs to a Lamb­da func­tion instead. There are even a num­ber of Lambda function blue­prints for push­ing Cloud­Watch Logs to oth­er log aggre­ga­tion ser­vices already.

Clear­ly this is some­thing a lot of AWS’s cus­tomers have asked for.

You can find blue­prints for ship­ping Cloud­Watch Logs to Sumo­log­ic, Splunk and Log­gly out of the box.

You can use these blue­prints to help you write a Lamb­da func­tion that’ll ship Cloud­Watch Logs to your pre­ferred log aggre­ga­tion ser­vice. But here are a few more things to keep in mind.

When­ev­er you cre­ate a new Lamb­da func­tion, it’ll cre­ate a new log group in Cloud­Watch logs. You want to avoid a man­u­al process for sub­scrib­ing log groups to your log shipping func­tion.

Instead, enable Cloud­Trail, and then set­up an event pat­tern in Cloud­Watch Events to invoke anoth­er Lamb­da func­tion when­ev­er a log group is cre­at­ed.

You can do this one-off set­up in the Cloud­Watch con­sole.

Match the Cre­ateL­og­Group API call in Cloud­Watch Logs and trig­ger a sub­scribe-log-group Lamb­da func­tion. This function would sub­scribe the new log group to the log shipping func­tion.

If you’re work­ing with mul­ti­ple AWS accounts, then you should avoid mak­ing the set­up a man­u­al process. With the Server­less frame­work, you can set­up the event source for this subscribe-log-group func­tion in the serverless.yml.

Anoth­er thing to keep in mind is that you need to avoid sub­scrib­ing the log group for the ship-logs func­tion to itself. It’ll cre­ate an infi­nite invo­ca­tion loop and that’s a painful les­son that you want to avoid.

One more thing.

By default, when Lamb­da cre­ates a new log group for your func­tion, the retention pol­i­cy is set to Never Expire. This is overkill, as the data storage cost can add up over time. It’s also unnecessary if you’re shipping the logs elsewhere already!

By default, logs for your Lamb­da func­tions are kept in CloudWatch Logs for­ev­er

We can apply the same tech­nique above and add anoth­er Lamb­da func­tion to automatically update the reten­tion pol­i­cy to some­thing more rea­son­able.

Here’s a Lamb­da func­tion for auto-updat­ing the log reten­tion pol­i­cy to 30 days.

If you already have lots of exist­ing log groups, then con­sid­er writing one-off scripts to update them all. You can do this by recurs­ing through all log groups with the DescribeL­og­Groups API call.

If you’re interested in applying these techniques yourself, I have put together a simple demo project for you. If you follow the instructions in the README and deploy the functions, then all the logs for your Lambda functions would be delivered to Logz.io.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store