Automate Boring Stuffs with AWS Batch Fargate & AWS EventBridge

Devashish Gupta
codelogicx
Published in
12 min readMar 9, 2023

In this medium blog we’re going to see how we can automate boring tasks with AWS Batch Fargate & AWS EventBridge.

Automate Boring Stuffs With AWS Batch Fargate & AWS EventBridge

For example we’re going to automate a task which is AWS RDS Database Dump to AWS S3 on Regular Time Stamp. For this we’re going to use following Technologies,

Technologies Used:

  1. AWS Batch
  2. Shell Scripting
  3. Docker File & Image
  4. MySQL Dump Tool
  5. Zip Tool
  6. IAM Roles
  7. AWS S3
  8. AWS EventBridge
  9. AWS SNS

AWS Batch:

AWS Batch lets developers, scientists, and engineers efficiently run hundreds of thousands of batches and ML computing jobs while optimizing compute resources, so you can focus on analyzing results and solving problems.

For this task we’ll be using a Docker image with shell script which will dump the database file, compress it into .zip file and then copy it onto AWS S3 bucket.

If we divide the shell script into parts, it will be like,

a) Dumping the database from the RDS

b) Compressing it into .zip

c) Copying the .zip file onto S3 bucket

Shell Scripting:

For this I’ve written a script which you can get from here: https://github.com/dcgmechanics/aws-batch-rds-s3/blob/main/MainScript.sh

We’ll be using this script with Docker image to use with AWS Batch, For this we’re going to use ubuntu base image with tools like curl, zip, mysql-client and aws cli which will be used by script to execute the task.

Dockerfile:

For this I’ve also written a Dockerfile which you can get from here: https://github.com/dcgmechanics/aws-batch-rds-s3/blob/main/Dockerfile

Remember, for this tutorial you can use either Docker Hub or ECR to store the image you built. You can customize the image as per your requirements.

Setting Up AWS Batch:

We can now move to the AWS Batch integration part, but the first question should be why did we go for AWS Batch with Fargate, why not EC2 or Lambda? Simple answer will be that the process will run for some limited time for e.g., the dump process starts at 12 PM and takes 10 mins to complete the backup process to S3 and for this running the EC2 for whole day won’t be feasible. Also, we can’t use AWS Lambda due to the time limit of 15 Mins, so if the dump and copy process took time more than 15 mins then the process will fail and the objective won’t complete. That’s where AWS Batch Fargate comes. It will run only for some specific time and when the process done there won’t be any resources usage, so there won’t be extra expenses due to that. So, let’s get started with AWS Batch setup.

As you can see these are many features present in AWS Batch which we will be mainly using for our task, We going to start with Compute environments. So, click on Compute environments and then click on Create.

After clicking on create you’ll get similar page like below

This will be used to setup Compute environment configuration,

For this we’re going to use Fargate so I’ve selected that, we can use any name as per your choice and after that there will be service role which will be created automatic if it’s not there. Then click on Next.

Remember we can attach any other services with the AWSServiceRoleForBatch to provide access with for e.g., AWS S3 bucket Put object permission.

In this page you can enable Use Fargate Spot capacity to save the compute cost up to 90% and in the Maximum vCPUs you can enter the compute env maximum vCPU limit. After entering the values click on Next.

In the next part we’re going to configure Network configuration,

This network configuration is totally dependent on your use case, for me the RDS is in same VPC which is selected here as well as the Subnets and Security Group. After selecting the VPC, Subnets and Security group click on Next.

After reviewing everything here, click on Create compute environment and the compute env is ready to work.

Now let’s go for the next part, Job queues. This is used as queue service for jobs which will be submitted for this specific Job queue. This is mainly used with Scheduling policies. For this click on the Job queues from the left panel and then click on Create.

In this page we’ve chosen Fargate for orchestration type. After that fill any name as per your convenience, then you can give the Priority to the Job queue. Job queues with a higher integer value for priority are given preference for compute environments. After that you can give Scheduling policy ARN if you’re going to use Scheduling policies also but for this tutorial we’re skipping this part as our use case is to use a single job at a time. It determines the order that jobs are run in, for default it’s set to a First In First Out strategy. In next box you need to choose the Connected compute environments. After choosing these values click on Create job queue.

& It’s done.

Now that our Job queue is also done, we can go for Job Definitions. Click on this option from the left bar & you’ll get a similar page like below.

We can initiate the setup of Job definitions by clicking on Create. After this you’ll get a something similar page like below,

As you can see there will be 4 Steps, We will start the configuration for the Job definition with the Orchestration type. Since we’ll be using Fargate so I’ve chosen same only.

Next, we’ll move forward with General configuration,

Use any name as per your choice, here you can also define the Execution timeout and Scheduling priority as per your use case. I’m leaving these areas blank as of now.

Next, we have to configure Fargate platform configuration,

First of all, we need to define the Fargate platform version which we will be using for the jobs. After that we’ll be needed to enable the Assign public IP toggle in order to fetch the Docker Image. Then we need to define the Execution role which is dependent on your use case. Since we’re going to use RDS and S3, there will be additional policies will be attached to this role.

You can also create your own and attach here. In next box we can define the Job attempts, It will help to run the job again of it fails for any reason. I’m going to leave this box blank and click on Next.

Here we’re going to define Image URL, Command syntax and actual Command. I’ve used the following command here:

./MainScript.sh -h Ref:\:host -u Ref:\:uname -p Ref:\:pass -d Ref:\:dbname -s Ref:\:s3name -o Ref:\:out

I hope you’d gone through the script which was listed previously. Here I’ve used Ref for the variables to use as Parameters.

As you can see, I’ve used these values as parameters. In next block we’re going to setup Environment configuration,

In this page we don’t need to change anything, whatever we’ve done so far is fine.

On the next page we can define Linux and Logging configuration.

For the Linux part I’m not doing anything since there is nothing else needed to run the script. In the Logging part I’ve used awslogs to save the jobs log. After choosing these click on Next.

In this page you can review the Job definition whatever you’ve chosen in Step 1,2 & 3 and then click on Create job definition.

And it’s done as you can see in the following screenshot.

Since our Job definition is done, we can move forward with Submitting our first Job.

For this click on Jobs from left menu and then click on Submit new job.

Here we’ve to fill Name of the Job and will have to choose Job definition & Job queue on which the job will execute.

In the next page we can Override some of the values which was previously set in the process of setting up Job definition & Job queue.

We’re not going to change anything here as everything is fine. So, let’s move forward by clicking on Next.

Review the Job configuration and click on Create job.

& It’s done. Now wait for the Job to be completed. After sometime you should see similar screen like below,

This represents that the Job executed successfully and now you should have a RDS Database Dump file in S3.

As we can see the file is present in the S3 Bucket. We can also check the Job execution logs in case if anything goes wrong.

For this, on the Job details page, click on Logging tab & then click on Retrieve logs.

It will ask to allow the CloudWatch access, Just type OK and then Authorize & you’re good to go.

So as you can see our AWS Batch part is done, Container & Script is running great. Now we can work on Scheduling the Job.

Setting Up AWS EventBridge:

For this we’re going to use AWS EventBridge, Click on Create rule,

And Enter Name and Description for the rule and choose Schedule in Rule type because this will trigger the AWS Batch Job on regular basis as pr scheduled time.

After filling these values, click on Next.

On the next page you can setup Cron expression to define when the Event will be triggered,

Here I’m using this value as an example. You can use any time as per your requirements.

After filling Cron expression click on Next.

On the next page you can choose the Target which will be triggered by this Event.

Here you can trigger more than one target, but for now we’re going to use it for our AWS Batch task,

Choose the service name, Fill the required fields and choose the options as shown in the Screenshot. In this the EventBridge will automatically create a Role for you which will be used to Execute the Batch job by EventBridge.

Then click on Create rule and it’s done.

Now you need to wait for the Event to get triggered to check if it’s working fine or not.

You can use CloudTrail to check for the trigger, Find the SubmitJob event in CloudTrail

It will show if the task successfully executed or not, also if there was any error it will show in the request and response in the body section of the Event which is really helpful for troubleshooting.

You can also check the AWS Batch Jobs tab to find the EventBridge triggered Job,

As you can see in my case it’s showing in the Batch console and status is Succeeded. We can also check the S3 bucket if the file has been uploaded successfully on not.

We can see the file is showing in the S3 bucket so the AWS EventBridge and AWS Batch working great together. Out goal is achieved as of now.

For making this process better, we can setup an SNS topic for S3 event so that every time the dump file gets uploaded on the S3 we will get notified as SMS or Email, whatever suits you.

Setting Up S3 Event Notification:

For this we can use one of the features of AWS S3 which is known as Properties > Event notifications.

Click on Create event notification, and fill the details like Name, Suffix and Event types on which we will get notified.

Here I’ve chosen .zip for suffix cause the uploaded files will be in .zip format and also we’re using cp command to copy the dump file from container image to S3 which is Multipart upload process.

Since we’ll be using SNS topic for the notification we need to setup the SNS topic with S3 access permission so that S3 can trigger SNS topic.

For this go to AWS SNS page and create a Topic.

Choose the Type and Name of the Topic,

After that click on Create topic. After that select the topic name and click on Edit.

There you need to edit Access policy,

And write the following JSON text in it.

Link: https://docs.aws.amazon.com/AmazonS3/latest/userguide/ways-to-add-notification-config-to-bucket.html#step1-create-sns-topic-for-notification

Remember to change the SNS-topic-ARN, bucket-name and bucket-owner-account-id and then click on Save changes. Then go back to the S3 Event notification page and choose the SNS topic and click on Save changes.

For this SNS topic you can create any type of Subscription to get notified about the .zip file upload event.

I’ve tested with Email type subscription and It works fine.

I know there are so many AWS services used in this tutorial but after setting up everything it will work like a charm. You can use this method to automate many tasks and batch process which are only for short period of time and save cost also with use of AWS Fargate for compute tasks.

Hope you’ve followed me till here and face no issue while setting this up. I’ve setup this from scratch so I believe there maybe more room for improvement in this setup which I’ll be investigating as I use it more and will update here also. I’m open for any suggestions and feedback regarding this setup.

Thank you so much for reading & trying this tutorial!!

remember, #SharingIsCaring ;)

--

--