There are multiple ways to run periodic jobs in AWS. In this post, I’m going to share a few options that I have successfully used in the past. None of these require long-term infrastructure, and you will only pay for the resources you use during the job runs. My goal here is to share several options so that you can choose the best one to fit your needs.
CloudWatch Events + Lambda
To do this, create a CloudWatch Rule and select “Schedule” as the Event Source. You can either use a cron expression or provide a fixed rate (such as every 5 minutes).
Next, select “Lambda Function” as the Target. You should be able to choose your Lambda from the drop down and even provide custom JSON as input to the Lambda function call.
I usually name my CloudWatch rules based on their schedule (ex. EVERY_15_MIN or MIDNIGHT_CST) and have multiple targets attached to them. This helps me easily look up all jobs that run on a particular schedule.
ECS Scheduled Tasks
This option requires that the code be packaged as a Docker container and does not have the time limitations of Lambda invocations. Otherwise, it’s pretty similar to the CloudWatch Events + Lambda option.
Next, create a Task Definition in Amazon Elastic Container Service (ECS) using Fargate as the launch type. Task Definition is where you will provide the parameters for the task — image, environment, command etc.
You could also use Amazon Elastic Compute Cloud (EC2) as the launch type if you already have a running ECS cluster. This would be ideal if you have capacity on the cluster to run the job. But if this is not an option, Fargate will let you run containers without managing servers, and you will only pay for the resources (CPU and memory) that the job uses.
To schedule the job, you can use CloudWatch as we did above (just choose “ECS task” as the target instead of Lambda). You could also go into the “default” cluster in ECS (which is created for you when you first start using ECS) and schedule it from the “Scheduled Tasks” tab. This second option still creates a CloudWatch rule, but it makes it easier to create from the ECS console.
CloudWatch Events + Lambda + EC2
This is a great option if you can’t (or don’t want to) package your code as a Lambda or Docker container and would like to use an EC2 instance that starts up and shuts down when the job is done.
EC2 has a feature called “user-data” that runs user-provided scripts on launch that can kick the job off. The trick is shutting down the instance once the job is complete. The last thing the script should do after finishing the job is terminating the instance.
To recap, use a CloudWatch Rule to kick off a Lambda. This will launch an EC2 instance with “user-data,” which will contain a Base64-encoded script that runs on the instance when it’s started. This script will run the job and terminate the instance once it’s done.
Here’s a Python Lambda to do this, which uses spot instances to further save on EC2 costs…
And here is how you can terminate the instance from a Linux shell script…
For more complicated jobs, consider AWS Batch. This lets you define multi-stage pipelines where each stage depends on the completion of the previous one.
Within each stage, jobs can be executed in parallel on multiple nodes. AWS Batch takes care of scheduling jobs, allocating necessary CPU and memory for each job, re-running failed jobs, and kicking off the next stage when all jobs in a previous stage are successful. It is essentially a simple workflow engine for batch jobs.
However, it does require that the code is packaged as a Docker image since it uses ECS clusters to run the jobs. Once again, you can use CloudWatch as the triggering mechanism to create job instances in AWS Batch directly. You could also go through Lambda which can apply logic to split work into multiple jobs.
For example, if you are trying to process files in Amazon Simple Storage Service (S3), your Lambda function could look at the total number of files, process them, then split them into chunks to create job instances in AWS Batch for each chunk.
I should note that traditional Extract, Transform, Load (ETL) tools usually ship with their own job scheduler. If you use these tools (either on-prem or AWS), it still makes sense to use the in-built scheduler because of its inherent advantages in running jobs that are native to the tooling.
Here we have looked at four ways to run periodic jobs using AWS native services that can replace a traditional cron server in on-prem deployments. I hope this is helpful when you are thinking about moving your cron jobs to AWS.