Performing load tests with Python + Locust.io

Thiago Ferreira
5 min readJun 21, 2022

--

In the context of high performance and availability systems, load tests are a type of performance test where we check how a determined part of the system will handle many simultaneous users at the same time.

When should I use load tests?

Load tests will come in handy in many ways:

  • When making performance improvements, we can check if the improvements are truly effective.
  • When making decisions about a particular tech stack, load tests can be used to compare multiple approaches to the same problem and determine which works best.
  • Estimate how many simultaneous users are supported by the given system.
  • Find bottlenecks, that is, components of the system that will take too long to respond and affect the overall response time and user experience.

Locust.io

Locust.io is an open-source tool created to design and run load tests. The user behavior is defined by pure python code, which makes it easily customizable and extensible for many use-cases.

Setting up the load test

It is possible to run this sample project on your local machine or even deploy it to a cloud provider of your choice. Running that in a cloud provider will provide more accurate results since it will be similar to a real production environment. To get started, we’ll need these two repositories:

  • locust-stress-test: Implements some tests using the locust.io tool. This is where we define the user behavior, like what pages it will access and the frequency.
  • django-todo-list: A simple API built in python/Django that will be our target in the load tests.

Locust code will look like this:

Here we are defining the user behavior:

  • Get a list of tasks using GET /tasks/
  • Create new tasks using POST /tasks/
  • Each user will wait between 1 and 3 seconds between requests.
  • The number of requests in the list of tasks is 5x greater than the requests to create new tasks. Since on many websites the user will probably read a lot more than write, this is a simulation very close to reality.

Setting up the environment

It’s preferable to run the load test in a cloud provider (AWS, GCP, DigitalOcean, etc) to have accurate results with the reality, since this way we avoid having our local machine interfering with the results. To do so, I chose the following resources for this example:

Running the test

The instructions to install and run the tests are detailed in the repository locust-stress-test, but in summary, we need to run:

poetry run locust --host=http://localhost:8000

After that, open the web interface displayed in your terminal, set up the test parameters then click “Start Swarming”:

Results and conclusions

Find below the complete report provided by Locust.io. Here are some observations:

  • Until around 160 simultaneous users, we were having around 60 requests per second and the response time was very acceptable for this context (95% of requests finished in less than 400ms)
  • The definition of what is an acceptable response time will vary depending on the context. Usually, we seek to have a response time lower than 300ms for APIs in general, but keep in mind this might change depending on the context of each system.
  • After passing 160 simultaneous users, note the number of requests per second remains the same (60rps), but the response time increased a lot. This way, each user started having slower responses.
  • In summary, with the available computing resources, we could serve around 160 simultaneous users with 60rps and a response time lower than 400ms. To increase these numbers, we would need more computing resources or make performance adjustments in the code. We could also add cache layers such as Memcached or Redis where it makes sense.

Check out the full report:

Understanding CPU and memory usage

Application layer

In our application layer, we see the CPU usage growing until it reached 100%. In a way, it’s a good sign that CPU usage is ramping up. This means our application server is well configured and it is not leaving idle resources.

On the other hand, since our CPU usage is getting to 100%, this also means we need more computing resources.

Database layer

We can also see in the database stats that we have a high CPU usage. This shows that the database computing resources need to be increased. To do so, we have a few approaches:

  • Increase the computing resources for the database (+CPU, +Memory)
  • And/Or create read-only replicas, in a way we’ll have separated instances for reading and writing in the database.
  • And/Or add a cache layer (Redis, Memcached, etc) where possible to decrease the database load.

One step at a time

When performing load tests and improving performance, it is important to make small changes, one at a time. This way we can verify how much a given change affected the results. In the screenshot below, we can see the exact point where I changed a configuration and the response time improved a lot:

Combining small changes with monitoring (results and computing resources), we can understand how our system behaves under stress and adjust accordingly.

Additionally, this is a great usage of the Falsifiability principle in technology, which helps us make better and more assertive decisions about our infrastructure.

References

--

--