Memory Profiling and Load Testing a Python Application

Yuyi Kimura
Dev Whisper
Published in
9 min readJun 19, 2024

When working on software development, it’s crucial to ensure your application is both efficient and can handle a large amount of traffic. This is where memory profiling and load testing come into play. They allow us to monitor and optimize the performance of our applications.

This guide is about profiling and testing using tools that you can run on your computer, which is very useful to detect memory leaks, bugs or under-performing code before pushing to production.

These techniques are not a replacement for cloud monitoring and observability tools, they cover way more ground and provide a lot of insight about your application (but they are also expensive).

Load Testing

Load testing is a technique that simulate real-world conditions to understand the behavior of an application under heavy load. It allows engineers to foresee and fix potential performance issues before they affect real users.

We’ll be using locust to load test a simple Flask application in order to show how we can simulate real-world scenarios through its powerful, scriptable nature.

We will need to install some dependencies:

pip install flask Faker

Now, let’s code a simple a Flask application (app.py):

import random
from faker import Faker
from flask import Flask, jsonify

fake = Faker()
app = Flask(__name__)


def generate_users():
"""
Generate a list of random users.
"""
users = []
for i in range(100):
users.append(
{
"id": i,
"name": fake.name(),
"age": random.randint(18, 99),
"email": fake.email(),
}
)

return users


def generate_user(user_id: int):
"""
Generate a random user by ID.
"""
return {
"id": user_id,
"name": fake.name(),
"age": random.randint(18, 99),
"email": fake.email(),
}


@app.route("/users", methods=["GET"])
def list_users():
"""
List users endpoint.
"""
return jsonify({"status": 200, "users": generate_users()})


@app.route("/users/<int:user_id>", methods=["GET"])
def get_user(user_id: int):
"""
Get user by ID endpoint.
"""
return jsonify({"status": 200, "user": generate_user(user_id)})


if __name__ == "__main__":
app.run(debug=False)

We can run this application by simply executing: python app.py.

You can perform a GET /users request to visualize the list of users:

# Using cURL
curl localhost:5000/users

# Or simply visit <http://localhost:5000/users> in your browser

Now that we have an application to put under load test, let’s get started with locust. First, let’s install it:

pip install locust

Once installed, we’ll write a locustfile.py that defines the behavior of our simulated users. In the script, we’ll specify the endpoints that locust should hit and the wait time before each defined task.

import random
from locust import HttpUser, task, between


class SimulatedUser(HttpUser):
wait_time = between(1, 5)

@task(1)
def view_users(self):
self.client.get("/users")

@task(4)
def view_user_by_id(self):
user_id = random.randint(1, 100)
self.client.get(f"/users/{user_id}")

Here is a detailed description of the key concepts involved:

  • HttpUser: This is a class represents a simulated user and they should mimic real world users. In the script, we define a class SimulatedUser that inherits from HttpUser.
  • Task: The task decorator is used to specify methods that the HttpUser should execute when simulating a user.
  • wait_time: This is a property of the HttpUser class that specifies how long a simulated user should wait between executing tasks.

We can now run our Locust script by simply executing:

locust -f locustfile.py

This will spin up a web interface at http://localhost:8089 in which you can specify your testing arguments as follow:

Locust web interface

In this interface, we define three main arguments:

  • Number of users: Total number of users you want to simulate at peak concurrency.
  • Ramp up: The number of users that will be simulated per seconds (in our example it means that 10 new users will be added to the load each second until they reach the peak which is 1000). This is quite useful if you want to see at which point your application starts to underperform.
  • Host: This is the host of your server application that each user will target while executing their tasks. Since our Flask application is running locally on port 5000, the value here is http://localhost:5000.

Click the Start button and Locust will begin the load test. The first tab that you will notice is Statistics, in which you can view a table with all the requests executed, their response time, and some other stats.

Let’s move to the Charts tab to get a more friendly representation of your application performance. Here you will get three charts:

  • Total Requests per Second: This is basically the number of requests your application is handling per second.
  • Number of Users: It simply indicates the number of users over time.
  • Response Times: Shows how long your application takes to return a response.

You will see two lines in the Response Times chart:

  • Average Response Time: Which as the name states, is the average response time it takes.
  • 95th percentile: Which indicate the response time below which 95% of all recorded response times fall. For example, if the 95th percentile response is 50ms, it means that 95% of the requests were completed in less than 50ms, while the remaining 5% took longer.

It’s important that your 95th percentile isn’t significantly higher than the average response time, since it may indicate performance issues on an important subset of users.

There are other tabs in which you can monitor the Failures or Exceptions that may have occur during the tests. This is very important since you can debug possible failures that only happen during heavy-load scenarios.

Locust also allows you to download the report in a HTML format so you can visualize later and compare it after you have improved your application.

Memory profiling

Memory profiling consist in analyzing a program’s memory consumption to identify and solve issues related to unusual memory usage. It helps you understand how you application allocate resources and its very useful when sizing your cloud instances correctly.

There are multiple tools for memory profiling out there, but for today post, will be using memray because of its ease of use and the various reports it can generate.

Let’s install it using pip:

pip install memray 

Using our Flask application built above, let’s now perform memory profiling:

memray run -o result.bin app.py

Once the application is running, you can perform the load test defined above to simulate a real-world scenario while memray tracks the memory usage.

After running the load test for a while, simply stop memray (Ctrl + C the process) and it will generate a result.bin file with all the tracked data.

Now we can generate a report using the following command:

memray flamegraph result.bin

This will generate a flame graph in HTML format for you to visualize in your browser. Flame graphs are a visualization of program resource usage that help identify which parts of your code are consuming the most memory during execution.

Flame graph generated by memray

Each block on the x-axis represents a function in the code, with the size of the block representing the amount of memory that function uses. The y-axis represents the stack depth. memray actually defaults to the Icicles view which is basically inverting the y-axis so the root is at the top and the functions at the bottom. You can choose either view based on your preference.

However, the critical detail to note is the width of the function blocks. The wider a block is, the more memory has been allocated to that function. Therefore, if a block appears unusually wide, it could indicate that a potential memory leak is affecting your application.

Another very important graph within memray is the memory size over time, which you can see by clicking the header with the orange and blue lines. In this graph you can see the Resident Size and the Heap Size and how much memory is allocated over time.

Memory size over time graph

This graph is crucial because, under normal circumstances, you would expect these lines to maintain a specific range. If your application includes memory-bound endpoints, you might anticipate some spikes. However, based on how you’ve defined your load test, these lines should normalize once the memory-bound endpoints have completed their tasks. Consequently, if a line continually grows over time without ever decreasing, it’s a significant sign of a memory leak.

Bonus: Simulate memory leak

As a bonus section, let’s simulate a memory leak so we can see how both tools can help us identify it.

First, let’s update our Python code so we introduce a memory leak endpoint (something you should never do in production):

# Intentional memory leak: a growing list
leaked_data = []


@app.route("/leak")
def leak_memory():
"""
Memory leak endpoint.
"""
leaked_data.append([i for i in range(10000)])
return jsonify({"status": "Memory leaked!"})

Every time a request is made to this endpoint, a list of 10,000 integers is created and appended to a global list named leaked_data. Because leaked_data is a global list and is not cleaned up after each request, the memory it uses does not get freed up. This means every time a request is made to the /leak endpoint, more and more memory is used up, simulating a memory leak.

Let’s update our locustfile.py to include this endpoint call:

import random
from locust import HttpUser, task, between


class SimulatedUser(HttpUser):
wait_time = between(1, 3)

@task(1)
def view_users(self):
self.client.get("/users")

@task(3)
def memory_leak(self):
self.client.get("/leak")

@task(4)
def view_user_by_id(self):
user_id = random.randint(1, 100)
self.client.get(f"/users/{user_id}")

Now, we can simply run the application using memray:

memray run -o result.bin app.py

And the load test using locust:

locust -f locustfile.py

Once you’ve set up your load test parameters and allowed it to run for some time, you should terminate the memray process. This will generate a result.bin file containing the tracked data. You can then produce a flame graph using this data:

memray flamegraph result.bin

Now let’s review both statistics generated by Locust and Memray:

Charts from Locust web interface

As the number of our requests increases, we observe that our response times largely remain consistent, barring a single spike. This was expected since the memory leak we introduced does not significantly affect the CPU performance of the application. Thus, we could conclude that our application has successfully passed its load test.

Flame graph (Icicles) of our memory leaking application

Here, we can observe more alarming graphs. As explained earlier, the wider the function blocks, the higher the memory allocation. Therefore, it’s easy to identify that the leaked_data.append(...) block is responsible for the unusual memory consumption.

Memory size over time of our memory leaking application

Examining the memory size over time, we notice that it grows but never shrinks. This indicates a memory leak, as memory is not being released. This issue arises due to the use of a global variable in our example.

As you can see, memory profiling empowers you to detect issues within your application quickly and efficiently, eliminating the need for long hours of debugging and excessive memory usage logging.

Conclusion

Ensuring that your application is both efficient and capable of handling high traffic is very important in software development. Through the use of memory profiling and load testing, developers can proactively identify and resolve performance bottlenecks and memory leaks before they affect end users.

The goal of this guide is to equip you with the tools needed to understand how your application behaves under various loads and to detect unusual memory consumption before it impacts your users. By using load testing and memory profiling, you can identify and address performance issues proactively, ensuring a smoother experience for your customers.

Feel free to experiment with these tools and adapt them to your specific needs. Happy profiling and testing!

--

--

Yuyi Kimura
Dev Whisper

Full-stack software engineer and machine learning enthusiast.