Scrontab: The Serverless Crontab
Back in the day when I had slightly less grey hair and we ran infrastructure on more traditional tin, we always used cronjobs to run regular recurring maintenance tasks or batch jobs.
With the advent of serverless technologies like lambda and CloudWatch scheduled events, we now have a few more options that we use to perform some of our regularly recurring tasks.
Use case 1: Database partition pruning
I’m currently working on a project where we have an operational MySQL database which stores online order information. We store all future orders, which can have delivery dates up to 30 days into the future, and we keep all past orders for 30 days.
To make it easy to manage our delivery date time window, we partition all our database tables on delivery date.
To manage our partitions we have a Python application, deployed as a lambda, which drops any partitions for delivery dates older than 30 days and creates new partitions for for delivery dates 31 days into the future.
To run this application, we use a CloudWatch scheduled event to trigger the lambda every day at 2pm.
Because dropping and creating partitions in MySQL is pretty much instantaneous, we’re happy to run this code directly from a lambda, which has a maximum execution time of 5 minutes. In this case our code takes well under 30 seconds.
But what if you’ve got a considerably longer task to run like a meaty batch job?
Use case 2: Product data ingestion
We have another app in our infrastructure that needs to ingest a 5Gb xml file of product data into a PostgreSQL database once a week. There’s no way we’re doing this in 5 minutes. However, we still make use of lambda and CloudWatch scheduled events to help us out.
In this scenario, rather than the lamba doing the work, all it does is spin up an EC2 instance. The EC2 instance then does the batch data ingestion and terminates itself upon completion.
The beauty here is that we don’t need a server that’s up all the time just to do a job once a week via crontab; our costs are completely minimised as the EC2 runs only as long as it needs to and the CloudWatch/lambda costs are negligible.
This is a pretty simple technique to perform recurring tasks and jobs that we would have traditionally run on a cron server. Its also a really good use case for using serverless technology.