Episode 4 — Testing your Python/Django app’s performance
Load testing your web application
Last time we learned how to containerize our PDF generator Python/Django application. This time around is time to think about performance. How can we test how our application behaves under load.
When working on the application, we’re using a development machine, usually development machines have high end specs, like multi-core processors and plenty of memory, and we tend to test our app by using it ourselves, which by no means represents how the app will perform once we have multiple users trying to access it on a hardware that will most likely not match our current setup. That’s where load testing comes in.
There are many tools available to run load tests, but we’ll be using a Python based open source tool called Locust. Locust allows us to define our tests behavior using Python, and is able to run in distributed mode if we want to simulate even bigger loads. Let’s get going.
Since on the last episode we created a compose file for our application, the simplest way to load test with locust will be to add a locust container to our compose file. Now we’ll go one step further and use a nice compose feature, multi-file service definitions. With Docker Compose you can have services defined in multiple files, and even some services defined multiple times. When running, you then specify the list of compose files to use and they will be parsed one by one, if a service is defined again in another file, values are overridden given precedence to the last file processed. So let’s create a new compose file, based on the one provided by Locust.
The file defines a new service named “locust” using a prebuilt Docker Hub image, it exposes port 8089 to the same port on the local host, and mounts a folder named locust to the container’s root folder of the same name. We also define some extra flags for the default command of the image (which as you might guess is locust
). The -f
flag specifies the path to the file with the test definitions that lives inside the folder we mounted; and the -H
flag specifies the host (including the base path) for the tests. Notice we’re using the compose service name as host and pointing it to the appropriate port.
As you can see the service expects us to mount a folder with the test definitions file. So a folder named locust
must be created and the file named locustfile.py
should define our tests. So we need to create such a file, but if you have followed along the other episodes, you might remember that we don’t really have an end point to test, all tests were done using the Django Admin. So we’ll have to pause first to create an API endpoint.
Creating a PDF from a Django REST Framework endpoint
The first step to use Django REST Framework is to add it to our requirements file:
If you have never used Django REST Framework (DRF) I recommend you to first check out their quick start guide. As the quick start guide mentions we’ll need to update the settings file:
All tests we’ve done to the app so far were from the host machine’s browser using localhost
, since Locust will be using the url http://web:8000
to reach the app we need to add an allow list otherwise requests will be blocked. The list is read from an environment variable ALLOWED_HOST_LIST
or if not defined, defaults to asterisk which means any host. The second change updates the installed apps to include DRF and DRF’s token authentication. The last update is a block of DRF specific configurations that at the moment only sets the allowed authentication classes.
The next change will update the model that we’re using to track the page requests, we’ll add a foreign key pointing to the user that owns the request:
Next we need to add a serializer class that will handle how our PageRequest
models will be transformed to REST friendly JSON:
DRF does a lot of work for us, so we only define on a Meta
class what model we need to use for the serializer and what fields to include. Since the model already defines what values are acceptable for a field, you don’t have to repeat this here, but in some instances you might need to, for example in this serializer we specify that the owner is a hidden field that defaults to the current user. Finally we set which of those fields are read only, notice that read only fields should also be listed on the full fields list.
Once we have a serializer we can create a DRF ViewSet which is a special type of class-based view that combines the logic for a set of related views.
Once again DRF simplifies the work by using a ModelViewSet we only have to define a queryset
which is a Django representation of a query, from there DRF knows which model we’ll be using for this view. Then we specify a serializer to use and a permission class that allows any authenticated user to access the endpoint. Finally we define a get_queryset
function that will make the query used by this endpoint user specific, i.e. will only return page requests who’s owner is the logged in user. In general view sets will only work for simple models, if you want to created nested objects things will become much more complicated, since you’ll need to use nested serializers and add code for the write operations.
We have a new view that can handle GET
, POST
, HEAD
and OPTIONS
methods, but no way to reach them, let’s update the URLs file:
In this file we have added a DRF DefaultRouter to which we register our new view set. We then add the router URLS to the urlpatterns
object using the base path api/
for all of them.
With all this changes in place I would recommend you start fresh and make and apply migrations on a clean database. The commands to do so are:
cd pdfsvc
docker compose down -v
rm -R pdfsvc/page_request/migrations
mkdir pdfsvc/page_request/migrations
touch pdfsvc/page_request/migrations/__init__.py
docker compose up db -d
docker compose up -d
docker exec -it pdfsvc-web-1 bash
python manage.py makemigrations
python manage.py migrate
python manage.py createsuperuser
We start by moving to the app folder, then bring the services down including the existing volumes (-v
flag). We proceed to delete the migrations folder with all it’s contents, we recreate it again with and we also create an empty __init__.py
file. We start the database only in detached mode (-d
flag) to allow it to be ready before we bring up the app. We then start all services again in detached mode and we run a shell inside the web container where we make the migrations, run them and create a super user.
At this point if we run the app we’ll be able to browse to http://localhost:8000/api/
and see a list of available endpoints. If we click on the only available endpoint we’ll notice it complains about authentication. Since we included the session authentication in the DRF settings we can login (from the admin interface) and test the endpoint directly from the web page.
One last thing, since we updated the model to include an owner we won’t be able to create new page requests from the admin interface since no owner will be provided, so we have to update the admin code a little bit.
The save_model
function overrides the default and sets the owner to the request user before saving.
Creating a locust test
We can finally get back to our original task, which was testing our application under load. Let’s define a test:
If you have ever used the requests
module it should be pretty straight forward what the above code does and how to write other tests. Any how, if you want a better understanding of how it works you should read Writing a locustfile from the Locust official docs.
The core of the file is the PDFUser
class that inherits from HttpUser
and which includes a client
attribute that is an instance of HTTP Session, used to make requests. The client is initialized with the host parameter provided when calling the Locust command from the container, so we don’t need to include full paths, a relative path will work.
The class can include multiple functions that will be used for the tests, all of them should have the task
decorator. Some special functions exist like the on_start
that as the name suggest will run before running any of the tasks. This one is the best place where to handle authentication, and if you look at the code above it’s exactly what we do, storing the authentication token to be used in the task functions. Finally notice that a class attribute wait_time
is defined to be between 1 and 2 seconds.
Since our small application doesn’t have any other endpoints we only need one task in order to test it. Tasks can be used to test full action paths, i.e. call a series of endpoints with waits between them in the task function, call one endpoint per task, or mix and match. The idea is your tasks should mimic real world usage of the application, so for example in a web store one of the tasks could be list products, list products from a random category and view a couple of random product pages, another task can be all calls to add products to the cart and complete the checkout, one more where the cart is abandoned and so on.
We’re ready to load test the application. But before we do so we need to create a user. You might have already noticed that we have hard-coded credentials in the test file. So we must create a user matching them with the commands:
cd pdfsvc
docker compose -f docker-compose.yml -f docker-compose.locust.yml up
docker exec -it pdfsvc-web-1 bash
python manage.py createsuperuser
We start by moving to the app directory, starting up the services using both compose files so that the Locust container is also started. You can add the -d
flag and start the services in the background or open another terminal window and get a shell in the web container to create a super user. As an alternative if you already have another user you can modify the locust file with those credentials.
We’re finally ready to load test the application. Point your browser to http://localhost:8089/
and you will be greeted by the Locust test definition screen. Let’s start by testing a single user to get a performance baseline. What Locust does is create an instance of your test for every “user” you’re testing, the spawn rate controls how fast are new users spawn till you get the total number of users you want to test.
If you look at the charts below, you’ll notice even with one user we have response times of 6 seconds 😱. To get a feel of how this would behave with more users let’s set 50 concurrent users. And as you will see things start to get much worse. At around 50 users we have response time in the 30 second range 🙀. Now notice how we have two lines in the graph? One is for the average and the other for the 95% percentile. Averages can be deceiving since outliers can make the average “move” in this case towards a better response time. The 95% percentile will get rid of the outliers and show a closer to reality output.
As always you can check all the code from Github here:
Our simple PDF app is clearly not scaling well, so we still have more to do before deploying this code. On the next episode we’ll take a look at Celery and how to offload heavy processes to background tasks to make response times more consistent and the app more performant.
If you got here without reading the previous episode, here’s the link
Or continue reading the next episode: