Low-Cost Workers: Python Celery + AWS SQS + AWS EC2 Spot

Rohit Singh
The Startup
Published in
5 min readJun 4, 2020

Python Celery is an extremely useful and versatile job queue runner. It has really good documentation and numerous examples sprawled over github for various use-cases, including but not limited to mass-mailing, video transcoding, image resizing, external webhook triggering, etc.

Celery and other vegetables, neatly cut and laid down perfectly
Photo by Dose Juice on Unsplash

Celery also has a number of integrations into frameworks like Django, Pyramid, etc as well as different transports like RabbitMQ, Redis, AWS SQS, etc.

AWS EC2 Spot provides instances that are priced up to 90% less than on-demand instances, providing same configuration at a very discounted rate. These instances run in surplus compute power that is sometimes available over various zones and regions. The catch? AWS can terminate spot instances any time it wants to use and recover its compute capacity.

So this leaves limited use-cases for spot instances, i.e you can run workloads that are stateless or fault-tolerant. It’s also suited if you have big data workloads that can handle interruptions and resume from its last checkpoint.

Python Celery is by itself transactional in structure, whenever a job is pushed on the queue, its picked up by only one worker, and only when the worker reverts with the result of success or failure, is the task considered complete. If the task is setup to re-run on failures, it’ll be picked up again until it succeeds or the retry count is used up.

By leveraging these properties of celery, there are multiple types of tasks that can utilise EC2 spot instances to run huge workloads at a fraction of the original cost. Let’s try this combination of super-money-saving-job-queue-fun.

We’re going to use SQS as the queue broker, you can utilise any other supported broker too. To begin with, as for any python celery project, my requirements.txt file looks like this :

I’ve setup Celery to pickup necessary configurations from celeryconfig.py file in the project.

My tasks file (the one that celery will run as a worker) looks like:

To facilitate running of the celery worker easily, and automatically on any spot instance that’s created, I’m going to utilise docker and docker-compose . My docker-compose.yml file (note that we don’t require any Dockerfile since our application is simple enough to be executed directly by compose):

I’ve placed tasks.py , celeryconfig.py , requirements.txt in one folder called app, and the docker-compose.yml is placed outside this folder in my project.

Now all that’s needed for us is to ensure that whenever a spot instance starts, it automatically sets up the required software to run the above project and start consuming tasks off the SQS queue. To do that, lets leverage “Launch Templates” in AWS EC2. We’ll start off by creating a new launch template. You can give it any name you like that makes sense for your use-case.

In the “Launch Template Contents” > AMI section, I’ve used ami-0b44050b2d893d5f7 to run an ubuntu 18.04 hvm image on ec2. For all the other options, we can keep them as default, except the script that we want to run in “User data”. To setup the startup script, scroll down till the end and expand the “Advanced details” section, and further scroll down to the end. Any script that you enter in the User data section will be run as root once the instance is initialised. So we’ll use this opportunity to setup docker and run our celery worker using docker-compose.

Here’s my sample script for setting up docker and cloning the repo where the above celery project is located:

Do note that the above script is run as root user.
Before saving the template, you’ll have to make sure of the following:
* Create a simple SQS queue in your default region
* Create a user in IAM and give it permissions to access the SQS queue
* Populate the env vars in the script above before proceeding. The env vars are the directly picked up by docker-compose and it runs the celery service.

Once you’ve saved your launch template, you’re ready to request for your spot!

To do that, goto “Spot requests” in EC2 console and click “Request spot instances”. Here I’ve preferred to select the following options as per the spot request page:

Selecting Flexible workloads as it doesn’t matter what size the instance is of in my case

My launch template above is called “celery-sqs-spot”, and as you can see, I’ve changed the default instance type to a t3a.micro as that should suffice the need to run our worker. You can configure other details as required in terms of network vpc and availability zone. In the next section of “Tell us how much capacity you need”, I’ve selected 1 as this is a sample project and won’t require more than one instance. Play around with other settings and then you can finally hit:

And that’s it! Your super-low-cost celery worker will start and keep running until your spot instance request is set to run out. Even if your spot instance gets killed before it can finish the job, another will take its place and resume working (provided you’ve selected to maintain capacity in the request).

To test the above, you can push a job on the queue like:

Make sure you’ve setup all the necessary details in the celeryconfig.py file , wherever you’re running this script and adding the task from. You can find this code in this repo here.

Few other things that can be done to improve the working and handling of ephemeral nature of spot instances and the celery example above:
* Use EC2 metadata to listen for spot termination event, which comes 1–2 mins before actual instance termination and can give you time to wrap up your work. Celery does have provisions for shutdown events.
* Utilise IAM roles either directly on the template or spot instance and avoid putting in access credentials in the user-data script.
* Add authentication mechanism for pulling private git repos on instance boot.
* Autoscale capacity by more spot instances based on SQS queue length.
* Checkout spot.io , formerly spotinst, for their products on cost reduction of cloud-native workloads using ec2 spot instances.

Hope that this small guide was useful. Keep experimenting.

--

--

The Startup
The Startup

Published in The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +772K followers.

Rohit Singh
Rohit Singh