AWS Elastic Beanstalk rotated logs into a separate S3 bucket using AWS Lambda

Ravindu Abayawardena
6 min readJun 19, 2019

If we deploy our application in a standard environment where we run our own tomcat server, we can ssh into it and browse through files easily, thus we have the control over the log rotation by using a service like Apache Log4j or any other login services. We can configure all the settings via an XML file. But In AWS Elastic Beanstalk(EB) it’s limited.

AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as Apache, Nginx, Passenger, and IIS.

Servers, file systems all are in-build, therefore they have some restrictions. You can’t manage log files like in a standard environment. If you want to rotate logs, you can go to your EB environment -> Configuration -> Under S3 log storage enable Rotate logs

In AWS EB it’s pretty easy to retrieve log files though. Go to your EB environment -> Logs -> Request Logs -> Last 100 Lines/Full Logs

If you have a tomcat server in your EB, then you can find the logs in log -> tomcat -> rotated folder. All your application related logs are saved as catalina.out files. The problem with this method is that it downloads all types of logs. i.e. tomcat manager logs, application context logs, access logs, etc. This makes the zip files bigger too.

So it’s clear that a better solution is needed to get the log files easily. Here comes the AWS S3. AWS EB store all the log files in an S3 bucket under the name elasticbeanstalk-{region}-{account id}. Rotated logs are stored there hourly. Still, it’s very difficult to find files in an S3 bucket. This bucket contains many files such as previous versions of the Elastic Beanstalk app, various types of log files, etc. Also, it you have enabled load balance with more than one instance, for each instance the log files are saved in its own folder path.

Therefore it’s clear that we need to isolate the log files and put them in a separate place i.e. an S3 bucket for easy access.

We can use AWS Lambda to search the files and copy the necessary log files into another S3 bucket.

First, make sure that the S3 bucket has the logs. In the AWS CLI try the following command. Or if you have an EC2 instance already, you can use that too. You might need to do some configurations.

aws s3 ls s3://{buckt-name} --recursive | grep catalina.out

If the above command print some files, then we can continue. If not, make sure that you have enabled the rotate log in Configuration.

Now, lets create a Lambda function. Go to the AWS management console and go to Lambda. It’s under Compute.

Go to Create function. Now we are creating the code from the scratch using python. There are many languages that support lambda, but we use python since it’s lightweight and this is only a simple code snippet. In the Execution role section we need to give S3 bucket read/write permission to our lambda function. We can do this in the function creation process or can later to it. Let’s opt for the latter. Click Create Function.

1) Coding

Once the function is finished creating, you will see the Configuration page. There we can write our function, create its roles, allocate memory, set timeouts and many more. In the function code window we can write our python code.

source_bucket_name and destination_bucket_name are environment variables that we need to define. Under Environment variables define these two variables by giving Key and Value. Later on we can access them in our code using the following.

source_bucket_name = os.environ[‘source_bucket_name’] destination_bucket_name = os.environ[‘destination_bucket_name’]

I’m using Boto 3 to work with S3. It’s an AWS SDK for python. It’s used in here to get the list of files in the S3 bucket, copy from one bucket to another and delete. Since there are many files under the logs which are unwanted, I’m deleting them as I go since I want to make the for loop quicker when it runs the next time. But for the very 1st time, it will take some time depending on the no of files in your S3 bucket. So make sure to increase the timeout of the function from 3 second to a couple of minutes.

2) Permission

Remember the permissions part that we skipped earlier? Now it’s time to give the necessary permissions for out lambda function. By default it has permission to upload logs to Amazon CloudWatch Logs. This is an optional permission, which helps us to monitor our function. In this lambda function, we do both read and write to Amazon S3 buckets. So we need to give this function S3 access. Go to Execution role section. ‘Use an existing role’ is already selected there. In the Existing role drop down you can see the default role created for our function. Click it and you will be redirected to the AWS IAM Management console. There you can see the default added permission policy. To add S3 permission, go to Attach policies and select AmazonS3FullAccess from the policy list.

Now you are all set with necessary permissions to run the lambda function and log it. You can test the function now to see it’s working.

3) Trigger

Now, what we plan was to automatically copy logs to to another S3 bucket. We need an event trigger to achieve that. In the Designer section of our lambda function on the left side, we can see a list of triggers we can add. Select CloudWatch Events and go to Configure triggers.

Create a new rule by giving the required information. Rule type as Schedule expression. In here you can write a cron expression to trigger this lambda function. I want to run this lambda function, hourly since the rotated logs create hourly.

cron(0 * * * ? *)

Once you add it, ALL DONE. The finished function design should be like bellow.

Finished function design

This is it. Now all your logs will be in a separate S3 bucket properly folder structured by date.

My Elastic Beanstalk is load balanced with 2 instances. Therefore, you can see hourly 2 files are generated one for each instance.

Later on Using AWS S3 lifecycle configuration you can transition from standard storage to GLACIER storage class to minimize your cost.

Little bit of extra knowledge

Now you have seen that in my post that I talked about Folders in S3. But AWS S3 has a flat structure. And they have objects. There is no hierarchical file system like you see in windows or linux file systems. For the sake of simplicity AWS show these objects in a folder manner. AWS manage this by using the prefix ‘/’. That’s why when we create the file names we used ‘/’ (lambda_function.py : line 28)

Reference

--

--