Exporting cloudwatch logs to S3 through Lambda before retention period.

yogesh beth
Petabytz
Published in
4 min readSep 14, 2019

Permissions required:

  1. Create a lambda role with the following permissions:

2. Add the following bucket policy in the target destination bucket to specifically grant cloudwatch permission to GetBucketAcl and PutObject in S3:

{
“Version”: “2012–10–17”,
“Statement”: [
{
“Effect”: “Allow”,
“Principal”: {
“Service”: “logs.us-east-2.amazonaws.com
},
“Action”: “s3:GetBucketAcl”,
“Resource”: “arn:aws:s3:::bucket-name”
},
{
“Effect”: “Allow”,
“Principal”: {
“Service”: “logs.us-east-2.amazonaws.com
},
“Action”: “s3:PutObject”,
“Resource”: “arn:aws:s3:::bucket-name/*”,
“Condition”: {
“StringEquals”: {
“s3:x-amz-acl”: “bucket-owner-full-control”
}
}
}
]
}

3. Switch to lambda console now, choose the author from scratch option and create a new function with python 3.7 as the runtime environment, choose the existing role we created, which is, lambda_basic _execution in this case.

4. You will land on to screen like this.

5. Choose cloudwatch events as a trigger, you’ll then redirected to configure the triggers

Choose the desired log group, you can add multiple log groups if required.

6. Choose cloudwatch event for running the cron, I wanted the cron to run at 12:01 am every day, so I have configured it that way, you can change the cron expression based on your requirements with the help of One thing to remember is the cron expression for aws rule has 6 characters, unlike others which have 5 only.

7. Add the following code line in a chosen lambda function, for the handler, put file-name.function-name, the name python file

import boto3

import collections
from datetime import datetime, timedelta
import math
import time
region = ‘us-east-2’
def lambda_handler(event, context):
log_file = boto3.client(‘logs’)
nDays = 1
deletionDate = datetime.now() — timedelta(days=nDays)
print (deletionDate)
startOfDay = deletionDate.replace(hour=0, minute=0, second=0, microsecond=0)
endOfDay = deletionDate.replace(hour=23, minute=59, second=59, microsecond=999999)
group_name = [‘/aws/lambda/logs-for-multiple-events’]
for x in group_name:
response = log_file.create_export_task(
taskName=’export_task’,
logGroupName=x,
fromTime=math.floor(startOfDay.timestamp() * 1000),
to=math.floor(endOfDay.timestamp() * 1000),
destination=’aws-inventory-handler’,
destinationPrefix=’Exported-logs’
)
response = log_file.put_retention_policy(
logGroupName=x,
retentionInDays=retention_days(nDays)
)
# time.sleep(10)
def retention_days(n):
retentionInDays = [1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 3653]
for retention_day in retentionInDays:
if n < retention_day:
return (retention_day)

from time and to fields accept the timestamp in milliseconds format and in the normal timestamp, the number of seconds that have elapsed since 00:00:00 Thursday, 1 January 1970. So to convert it in milliseconds, I’m multiplying it by 1000.

Also, increase the default timeout settings to 13 seconds*number of log-groups, in case you’re adding more log groups to array group_name[]. We are using this to pause the execution for a while as create_export_task has a limitation which says that:

Each account can only have one active (RUNNING or PENDING ) export task at a time.

So after execution of the first log group, the code waits for 10 secs for the first task to mark it as completed and then move to another log group.

You can also add a retention policy based on your use-case from the console. Since it is a one-time task and cron is not required for this, I have added this from the console itself. I have added a function as well to calculate the retention period based on the user for ‘nDays’.

In case you want to iterate through all the log groups present in cloudwatch, add this function to fetch all the group names:

def group_names():
groupnames = []
paginator = log_file.get_paginator(‘describe_log_groups’)
response_iterator = paginator.paginate()
for response in response_iterator:
listOfResponse=response[“logGroups”]
for result in listOfResponse:
groupnames.append(result[“logGroupName”])
return groupnames

And Replace group_name = group_names() under def lambda_handler(event, context).

Do not forget to increase the lambda time-out settings.

Save and run the function to test it and let the lambda spark. :)

--

--