Setting up a Queueing System on AWS

Codebase
Berkeley Codebase
Published in
13 min readOct 29, 2017

A system that uses an AWS SQS FIFO Queue to store jobs and an AWS EB Webserver as a worker.

by Saurav Kadavath, Project Manager

As project manager of the YiTuuX project at Codebase, this was originally a guide that I wrote for our team. When I first set up all of this tech on my own, I realized how tedious it was to sift through pages of StackOverflow and ServerFault — and so I decided to document my process for my own future use and for the reference of my fellow team members.

YiTuuX is a company that creates AI-driven medical applications. However, they needed scalable infrastructure in order to deliver their services to their clients. We decided that they best way to help YiTuuX solve some of their problems was to develop a cloud-based application that included a user-facing webserver that could offload major workloads onto separate instances using technologies such as Django, SQS, EC2, S3, and other AWS services. We created the application from scratch, and by the end of the semester, we had a complete application where users were able to upload retinal scans and receive diagnoses on a particular disease called diabetic retinopathy. In the future, this application will be used by doctors to run YiTuuX’s algorithms on real patient data.

For the project, I had to create a queuing system to handle large amounts of incoming HTTP requests. I decided to use a bunch of instances of AWS’s brand-new FIFO SQS implementation + an Elastic Beanstalk Worker Environment to keep track of and process requests, and used an Elastic Beanstalk Web Server environment running Django to handle web requests from clients and feed them to the SQS instances. As a novice developer, I found the documentation difficult to navigate, so I decided to make this guide to help others in a similar situation. I’m writing this after going through the entire process, so it’s definitely possible that I missed steps / did something inefficiently. If that’s the case, or if you spot any other errors, shoot me a message at sauravkadavath@berkeley.edu.

CONTENTS:

  • Setting up your local computer for developing on Django / AWS
  • Setting up an AWS Elastic Beanstalk (EB) Web server running Python 2.7 and deploy a sample Django app on it from the AWS Elastic Beanstalk Command Line Interface (CLI)
  • Creating a FIFO queue with Amazon SQS and communicating with it
  • Setting up an AWS Elastic Beanstalk Web Server Environment as a worker process with a python daemon run with supervisord

After this, it should be easy to pull data from your SQS queue with your EB worker.

SETTING UP OUR DEVELOPMENT ENVIRONMENT

We’re going to be deploying AWS Elastic Beanstalk, which is essentially a fancy web server from Amazon that scales automatically and more.

Since each EB instance is essentially a blank self-contained computer, we want to make sure any code we write that uses external dependencies can still run on any EB instance. . In order to do this we’ll use something called virtual environments. A virtual environment allows you to [explanation]. We’ll be using an environment manager called virtualenv Additionally, make sure you have Python 2.7 installed on your local machine. Installation instructions can be found here.

NOTE: If you have Anaconda installed uninstall it and all of its dependent files, and if you are on Windows, clear all of the environment variables related to it. After a lot of headaches, I figured out that virtualenv and Anaconda don’t like to play together, and I couldn’t get them to cooperate. If you have a fix, feel free to email me.

Assuming we don’t have Anaconda on our machine and we have Python 2.7 installed, we move forward:

  1. Install pip and virtualenv:
  • Download get-pip.py and run python get-pip.py
  • In your terminal, run pip install virtualenv
  1. Now, let’s make a virtual environment for our project. Run virtualenv <path_to_environment_folder> . You can choose where to save all of the environment dependency files. For example, I chose to run virtualenv C:/eb-env, and all of my virtual environment files were saved in C:/eb-env. If we take a look inside the folder that we chose as your environment folder, we can find a fresh install of tools like python and pip. For the following steps, I'll use C:/eb-env when referring to the virtual environment path.
  2. Activate your virtual environment. Inside the environment that you made, there is a script called activate. We'll need to run this with the sourcecommand. For example, I ran: source C:/eb-env/Scripts/activate. For Mac or Linux users, the script is probably going to live inside a bin/ folder in your virtual environment folder. What running this script will do is change the python and pip commands on your terminal to temporarily use the files in the virtual environment, not from your global install. Thus, whenever you want to use the virtual environment, you have to run source C:/eb-env/Scripts/activate each time you open a new terminal.
  3. pip install django. Remember, this is the pip running from the virtual environment
  4. Set up a Django dummy project that we can push to Elastic Beanstalk. Navigate to the folder in which you want your project to live and run django-admin startproject eb-django-proj-1. This is just a 'default' project that does nothing. In order to run it, cd eb-django-proj-1 and then python manage.py runserver and then go to the http://127.0.0.1:8000. I found these Django tutorials super helpful.
  5. To verify that Django has been installed, type in pip freeze. You should see that Django is installed, and maybe a small amount of its dependencies. The main thing is that you should not see the same massive list of dependencies that you would see if you typed in pip freeze from your global environment. To exit out of your virtual environment, type in deactivate.

SETTING UP AN AWS ELASTIC BEANSTALK WEB SERVER

So now, we basically have all the tool on our computer to develop self-contained Django apps. Now, let’s work on moving our work into AWS. If you’re the one making the AWS resources yourself, keep reading. Otherwise, if you are an IAM User, ask your AWS system administrator to give you an access key and access secret, and skip to step 3.

  1. Make an Elastic Beanstalk application for Python. When doing this, you must make sure that there is a key pair associated with this instance. In order to do this, make sure you have a key pair selected in the EC2 Key Pair dropdown menu:

If this isn’t there, make a new key pair and keep this file safe. Navigate to the AWS dashboard (cube on the top left, and choose EC2), and click on key pairs.

Keep stepping through the menus, and this will set up a default EB environment for Python. Wait for the system to show a green checkmark, and visit the application to make sure you have the default AWS Sample Application running properly.

2. In order to deploy our app from our local machine to AWS, we’re going to need an access key and secret. Go to the top right of your Elastic Beanstalk dashboard and click on your name, on from the dropdown, click “My Security Credentials”. Here, you can set up an access key/secret for yourself. Be sure to keep these safe, as you only get to see them once.
Note: If you have other people that you want to be able to deploy to your EB environments, you can create IAM Users for each person, and give them their own access keys and secrets.

3. On your local machine, make sure that you are on your global environment, and run pip install awsebcli. This is the CLI that lets us talk to our EB instances. After you do this, you should be able to find the AWS CLI configuration file – located at ~/.aws/config on Linux and OS X systems or C:\Users\USERNAME\.aws\config on Windows systems. Go into this file and paste the following code in:

[profile eb-cli]
aws_access_key_id = <YOUR_AWS_ACCESS_KEY>
aws_secret_access_key = <YOUR_AWS_ACCESS_KEY_SECRET>

[default]
aws_access_key_id = <YOUR_AWS_ACCESS_KEY>
aws_secret_access_key = <YOUR_AWS_ACCESS_KEY_SECRET>

[default]
region=us-west-2

Save this stuff, and open a new terminal for the following steps.

4. Navigate to where your Django project is located. Run eb init. This should give you a series of questions to answer. Your EB apps and environments that you set up already should come up, so you can choose those. For example, I got something like this (I have multiple environments for my project right now):

C:/Saurav/djangotuts$ eb init  Select a default region 
1) us-east-1 : US East (N. Virginia)
2) us-west-1 : US West (N. California)
3) us-west-2 : US West (Oregon)
4) eu-west-1 : EU (Ireland)
5) eu-central-1 : EU (Frankfurt)
6) ap-south-1 : Asia Pacific (Mumbai)
7) ap-southeast-1 : Asia Pacific (Singapore)
8) ap-southeast-2 : Asia Pacific (Sydney)
9) ap-northeast-1 : Asia Pacific (Tokyo)
10) ap-northeast-2 : Asia Pacific (Seoul)
11) sa-east-1 : South America (Sao Paulo)
12) cn-north-1 : China (Beijing)
13) us-east-2 : US East (Ohio)
14) ca-central-1 : Canada (Central)
15) eu-west-2 : EU (London)
(default is 3): 3
Select an application to use
1) django-tutorial
2) [ Create new Application ] (default is 2): 1
Select the default environment.
You can change this later by typing "eb use [environment_name]".
1) my-worker-env
2) my-env (default is 1): 1
Cannot setup CodeCommit because there is no Source Control setup, continuing with initialization

5. Now, if you run eb open, youll open the same default website as before. Now, lets push our own site.

6. First, remember the reason that we created a virtualenv at the beginning — to make sure we could keep track of all the requirements of the project. We’ll write all of those requirements into a file that EB can read when we puch it. Run the following:

source C:/eb-env/Scripts/activatecd <YOUR_PROJ_DIRECTORY>pip freeze > requirements.txt

7. Now, we need to add a simple configuration file for our app so EB can actually run it properly. Make a directory called .ebextensions in your project folder, and create a file called django.config in it. Paste the following contents into the file:

option_settings:
aws:elasticbeanstalk:container:python:
WSGIPath: PATH_TO_WSGI.PY/wsgi.py

Make sure to replace the path with the correct path to your wsgi.py file.

8. Now, run

eb deployeb open

You should see your pushed website live on elastic beanstalk!

SETTING UP AWS SQS AND COMMUNICATING WITH IT WITH BOTO3

This is by far the easiest section. I found navigating SQS and Boto3 docs really easy

  1. Head over to the AWS dashboard and select SQS. From there, you’ll be able to make new queues. For this, I’ll have made a queue called test-queue1.fifo. Note this is a FIFO Queue. FIFO queues are only available in certain regions (outlined in the docs). When you first make a queue, only your AWS account will have permissions for it. To give other accounts permissions, you can add users at the SQS queue dashboard:

2. Now that we have our queue set up and waiting for messages on the cloud, let’s try sending it some messages. The way that we’re going to be doing this is using Boto3 on the Python interactive shell. Note that doing this will be equivalent to executing the same commands in any Python file — for example in Django. First, we need to install Boto3: pip install boto3. Note that if you need to push code that relies on Boto3 onto EB, you'll need to run this install in your virtual environment and make sure that your requirements.txt is updated properly.

3. We need to configure Boto3 because we need to make sure that it knows who we are when we begin talking to AWS SQS. There are several ways to do this, and they’re all explained very well HERE and HERE. I personally like adding environment variables for Boto3 to work off of with Python at the top of my main file:

import osos.environ["AWS_ACCESS_KEY_ID"] = "YOUR_AWS_ACCESS_KEY_ID"
os.environ["AWS_SECRET_ACCESS_KEY"] = "YOUR_AWS_SECRET_ACCESS_KEY"

Note that these are the same keys that we used before. (They are tied to our account, and are recognized across all AWS services.)

4. Now, we can start attempting to communicate with SQS. These examples are pulled straight off of the Boto3 Documentation — which is really good. Open up a python shell, and here is some code to start us off:

# Configuration
>>> import os
>>> os.environ["AWS_ACCESS_KEY_ID"] = "YOUR_AWS_ACCESS_KEY_ID"
>>> os.environ["AWS_SECRET_ACCESS_KEY"] = "YOUR_AWS_SECRET_ACCESS_KEY"
# Get the service resource
>>> sqs = boto3.resource('sqs')
# Get the queue. This returns an SQS.Queue instance
>>> queue = sqs.get_queue_by_name(QueueName='test-queue1.fifo')
# You can now access identifiers and attributes
>>> print(queue.url)
>>> print(queue.attributes.get('DelaySeconds'))
# Create a new message
# FIFO queues require MessageGroupIds
>>> queue.send_message(MessageBody='Hello, World! (0)', MessageGroupId='0')
>>> queue.send_message(MessageBody='Hello, World! (1)', MessageGroupId='0')
>>> queue.send_message(MessageBody='Hello, World! (2)', MessageGroupId='0')
# Receive the message from the queue
>>> message = queue.receive_messages(MaxNumberOfMessages=1)
>>> print(message.body)
Hello, World! (0)
# Delete the message from the queue
>>> message.delete()

More information can be found here.

SETTING UP AN AWS ELASTIC BEANSTALK WORKER

This was probably the most confusing part of this whole ordeal (maybe because of the lack of documentation for some things on AWS, maybe because of my ignorance :P). What we’re going to attempt to do here is set up an AWS Elastic Beanstalk Web Server Environment running Python (a separate environment from the one we set up in the second section). The goal will be to run a Python daemon in the background using supervisord. We won’t go into actually making this daemon poll our SQS queue, but at the end, it should be pretty apparent how to make this work (see the boto3 setup and guide above).

If the end goal involves making the Environment dynamically change the number of EC2 instances running based on the size of the SQS queue, we can do that using AWS CloudWatch Alarms. Instructions can be found here.

Alright; let’s get into setting up a simple worker

  1. Spin up a new EB environment. For now, we’re going to make a single EC2 instance inside it (as opposed to an auto-scaling group). Make sure you select the right key pair so that we can ssh into it.
  2. Let’s make the simple python process that we want to run forever. Here’s mine:
    I called the file sqsd_v2.py and saved it in some directory in my local machine
import time # time is a built-in packagethe_time = 0 
print("Starting sqsd_v2.py")
while(1):
print("Time = " + str(the_time))
time.sleep(5) # Sleep for 5 seconds
the_time = the_time + 5

3. AWS EB comes pre-installed with a program called supervisord. What it lets us do here is make sure that our python script keeps running (i.e. restart it) even if something were to cause it to unexpectedly crash. This file needs a configuration file to run, so make a file called supervisord.conf. Put thiscontent into that file. This is the default file that is provided from the supervisord website. I modified a few lines under the [program:sqsd_v2]heading for our needs. Here are the highlights:

  • The name of the program that we are running is sqsd_v2. You can tell from the title of the heading.
  • The actual command we are telling supervisord to execute is python -u sqsd.py. The -u flag is to provide unbuffered binary stdout and stderr output (i.e. it will force stdin, and stdout to be totally unbuffered). This is necessary for us to view the print() outputs of our script.
  • We have also specified log files: stdout_logfile=~/logs/stdout_logs.logand stderr_logfile=~/logs/stderr_logs.log. These are the files to which our program will output its print() statements.

4. Remember the log files from the last step? We actually need to make them for supervisor to be able to use them. So, make two empty files called stdout_logs.log and stderr_logs.log on your machine. We will now push all of these files to the EB machine.

5. Let’s first make the folder for the logs on the EB machine. SSH into it with eb ssh and once you're in, mkdir logs. ls to verify that it has been created and type in exit to exit out of the machine.

6. Now, lets push our files up onto the machine. Run the following commands from where you have the 4 files we just made:

scp -i path\to\your\keypair.pem sqsd_v2.py ec2-user@IP_ADDRESS:~/scp -i path\to\your\keypair.pem supervisord.conf ec2-user@IP_ADDRESS:~/scp -i path\to\your\keypair.pem stdout_logs.log ec2-user@IP_ADDRESS:~/logs/scp -i path\to\your\keypair.pem stderr_logs.log ec2-user@IP_ADDRESS:~/logs/

Protip: You can copy/paste the key components of this command from the output of eb ssh:

Another Protip: If you don’t want any ‘default’ content in your log files, one thing you can do instead of scping them is just use the touch command to create the log file in the appropriate place in the EC2 instance.
Try to eb ssh into the instance. If you ls, you should be able to find all of the files that we transferred.

Now, let’s try running the program. eb ssh into your instance. Type in the command to start supervisord:

supervisord --configuration="supervisord.conf" --nodaemon 

You should see that the process sqsd_v2 is starting. After like 10-15 seconds, Ctrl+c to stop supervisord and check the log file to make sure that we have some stuff in there from the program: vim logs/stdout_logs.log. You should see the print() output from the script. Some notes about what we did:

  • We ran supervisord with the --nodaemon flag. In production, you would run this without this flag (making the program run in the background) so that the program does not exit when you eb ssh out. You would use supervisorctl to start, stop, and restart processes. More info and advanced usage can be found at the supervisor docs.
  • When running without the --nodaemon option, we can set the log output for supervisord itself with the --logfile=FILE option.
  • supervisord is a complicated beast - docs are here.

Congrats! We’ve set up a lot of stuff:

  • Our dev environment
  • A Django-powered AWS Elastic Beanstalk Web Server Environment
  • An AWS SQS FIFO Queue
  • Boto3 so that Python can communicate with SQS
  • A very simple worker on an AWS Elastic Beanstalk Web Server Environment using Supervisor

Next steps:

  • Making the worker communicate with the SQS queue to process whatever tasks you may have on there
  • Perhaps storing results in an AWS S3 Bucket?

Thanks for reading! To learn more about Codebase, check out our website and follow us on Facebook!

--

--

Codebase
Berkeley Codebase

Software development @ UC Berkeley — Building a community for meaningful industry impact. https://codebase.berkeley.edu/