My CloudWatch Logs Are How Old?
I was scrolling through some old sandbox accounts last week doing a bit of spring cleaning when I stumbled across some ancient artifacts — CloudWatch logs from our first Lambda test deployments in 2016. ‘Unable to import module…’ and ‘Hello, Lambda!’ they wrote proudly into our log streams. Aside from my temporary amusement, these can’t possibly still be delivering any value. I did a little bit of digging, and it turns out the default logging retention for Lambda is…forever. Left to their own devices, these logs never expire.
Being a cost-conscious cloud consumer, I jumped from that realization straight to the CloudWatch billing page and quickly discovered that we may have been wasting whole tens of cents through the intervening years. As of this writing, CloudWatch log retention goes for a punishing $.03/GB. So how can we better tidy up after ourselves going forward while also saving a bit of loose change?
Option One: The Worst Choice — you can actually edit the retention setting from the CloudWatch menu in the AWS Console by clicking on the retention period and selecting a new one. We had a few dozen in the list, so this didn’t seem like a good use of time.
Option Two: Script it! The CLI supports a couple of useful commands here — ‘aws logs describe-log-groups’ and ‘aws logs put-retention-policy’ can be combined with a bit of bash to set the log retention for everything in the account. This is a lot faster than clicking on each one but has the inherent problem that we will forget to run it ever again, getting us back to where we started in about 3 years. We could set it up as a cron job if there were any servers running in this account, but this is the Serverless sandbox so there aren’t any.
Option Three: Write a lambda function tied to a CloudWatch event trigger that scans for specific log group prefixes and updates their retentions if they don’t conform. Now we’re talking!
I’m using the Serverless Framework and setting up a Cloudwatch Event schedule so that it runs every night in the wee hours of the morning. This could probably run weekly or monthly and be just fine as well. Next up, a little Typescript to do the exact same thing our bash script did — grab the list of log groups (in this case matching a prefix) and iterate over the list checking to see that each has a retention period set. The NodeJS SDK indicates that we could get a nextToken back if the list was paginated, so we also check to see if there are more results to fetch.
Sprinkle in a bit of light testing and some added lambda plumbing and we have now successfully whiled away an afternoon saving a few pennies each month.
If you’re interested in taking a look at the various solutions, I have uploaded the bash example here and the Serverless Framework example here. One final silver lining — if you are using the Serverless Framework for your Lambda projects, they have enabled this as an out-of-the-box feature via the logRetentionDays property under the provider configuration. I guess enough people stumbled upon old logs that this became a common feature request.