Python & Django optimization tools and process

Darya Plotnikova
May 29 · 7 min read

Django is a good instrument (not the best, but good enough) to create web apps as fast as possible, it includes a lot of useful batteries which allow to start up production-ready MVP in a days. But if you are going to evolve your app you need to pay attention to it’s performance. I’d like to introduce 3 tools which help to measure performance metrics and make optimization process more obvious. Here you can find demo project which is already configured to use this debugging tools.

Which metrics to pay attention on

As we are talking about web apps important metric is time to get response from browser. It consists of several steps:

  1. Sending request from browser to server.
  2. Preparing request object in python code and receiving it in our controller class of function.
  3. Process request (extract data, go to database/third-party services, possibly somehow calculate it, prepare response).
  4. Sending response back to the server.

Let’s list metrics which can make that steps more effective (decrease amount of time step is taken):

  • Amount of data sent from browser to server and back — affects 1st and 4th steps
  • Amount of data taken from database/third-party services — affects 3rd and indirectly 4th
  • Database/third-party queries count to receive/save data — affects 3rd
  • Calculation with data received from database/third-party services (business logic) — affects 3rd
  • RAM usage — affects 3rd, 2nd, 4th.

What tools do we have to measure them? Which metric does affect total timing in request-response cycle mostly? Which metrics are not so important for our app? Let’s try to figure it out.

Tools overview

We will consider 3 instruments which I recommend to use constantly, using them you will write faster code.

Django-silk

This is my favorite, it helps to check which SQL queries was executed, how many time did it take to prepare response (2–4 steps), how many time was spent on db queries, also it’s possible to use CProfile with any part of your code and visualize results.

Let’s try it out. You can play around with it in demo project. It contains simple Django project with 2 models — Author and their Publication, data fixture and Postman collection to provide quick access to existed API.

Clone demo and go through several steps:

  1. Create virtual environment and activate it.
  2. Install required libraries pip install -r requirements.txt.
  3. Setup database python manage.py migrate.
  4. Fill database with data python manage.py loaddata data.json
  5. Create superuser to have access to admin part of site python manage.py createsuperuser
  6. Run app python manage.py runserver.
  7. Open Postman and import collection.json file from repository .
  8. Go to http://127.0.0.1:8000/api/ in Postman using “Publications list” request in imported collection.

There is a list of publications created by different authors, data was prepared by fixture. As you can see than django-silk is already turned on, so it writes meta information about all HTTP queries received by server. Let’s see what did happen, open in browser http://127.0.0.1:8000/silk/.

This is meta info recorded by django-silk, click on block with 200 GET request and you’ll be redirected to this certain request info — amount of db queries, headers, body parameters.

Click SQL in top bar tabs, and you will see list of all database queries performed on 3rd step of request-response cycle. Each record has timing info and amount of JOINs performed in query. If you click any you will see details — which SQL was performed, where was it called from python code.

Let’s go to authors whose publications were visited mostly. Open Postman and go to “Top authors by visits (DB)” and send request. Here we use database aggregation to count publication’s visits for each author. After that go to “Top authors by visits (PY)” and send request again to get the same result but counting visits in python code.

Now open silk page http://127.0.0.1:8000/silk/requests/, you see 2 new requests returned the same response, but internally they are different — the first one did only 2 queries, the second 5 (try to figure out why they differ, answers are at the end of post, #1). This is how django-silk makes unefficient queryset usage obvious.

Another approach to use django-silk is to call CProfile to check if there are no blocks which takes a lot of time to perform some app logic.
Look at the time 3rd request took, it’s almost 300 times longer than the 2nd! Let’s try to realize why, click on it and go to Profiling tab.

Here you can see all profiled blocks, which were marked with silk_profiler decorator or context manager, click “top-authors-py”.

On profiling page you can see place where decorator/context manager was called, profile graph and cProfile table. Look at the graph, it’s the most obvious.

This is just cProfile visualization. It colors blocks according to the amount of time they took — red are the slowest functions and classes, yellow, green — the fastest.

As you can see responding to authors by visits we call vague function do_long_action() which took 66% execution time.

This function was added to demonstrate how easily profiling visualization of django-silk helps to find slow functions and classes, using it you can quickly locate and fix them.

Delete do_long_action() call and make a request again. What will happen? Will there another performance problems exist? Try to improve “Top authors by visits (PY)” view.

Django-silk has another useful features, more examples you can find on it’s GitHub.

nplusone

Simple but powerful tool to find N+1 queries during development — when you don’t prefetch data from related models but access them, usually by dot.

nplusone already installed in demo project, let’s try it. Go to Postman and make a “Publications with authors” request. There are id, title and author name for each publication in response. What may be wrong with such a simple request? If you look at your console output where Django is running you find Potential n+1 query message:

nplusone tells us that working with Publication model we didn’t prefetch authors info, try to find it and fix (answer at the end of post, #2). That’s all.
More details about prefetching data described in prefetch_related() and select_related() topics in Django docs.

Django debug toolbar

This helps you to see a lot of information set up in project like middlewares, installed apps, registered signals etc, but also it includes some request specific data for example list of rendered templates, headers, request data, sql performed in request-response cycle and so on.

This tool provides really useful information but it works only with HTML-rendered response because toolbar can not be rendered for JSON as it needs some place on HTML page to place its tags. So if you go to any URL of demo project in your browser you will see nothing except plain JSON response. To see how this toolbar looks like you can open admin site http://127.0.0.1/admin/ and there you will see panels, clicking to any of them you might explore details about your project.

Another handy approach with than library is debugsqlshell command, I usually use it to figure out how function or class produces SQL.

Just try to run python manage.py debugsqlshell in demo project and execute this lines of code in interpreter shell:

>>> from app.views import get_authors_db
>>> get_authors_db()

You will see direct result, list of authors at the last line of output. But also before that you can find some SQL, this is how get_authors_db() function got data from database.

This management command is really convenient to test some functions or classes without necessity to run whole request-response cycle.

Cons & Pros

We figured 3 tools which help developers understand how their code really acts in context of working with database and with python code somehow. Below you can see comparison table, I tried to summarize everything what we’ve talked about.

Of course this table is not exhaustive, but it gives some view to choose tool which is more compatible to your needs. Anyway at least 2 instruments (silk and debug-toolbar) are huge debugging systems with their own batteries, so feel free to dive deeper and discover how you might use it to track performance problems and improve your project.

Share in comments which tools do you know and use daily to debug and measure your python and Django code.

Answers to questions at post

#1
The reason why db-request did only 2 requests to database is that we used prefetch_related() to populate publications of authors just in one extra query, done after query to Author’s table. Unlike that we didn’t request authors related table in get_authors_python() so when we construct output Django goes to publication for each author in separate queries.

#2
In Publications with authors when we construct output, we call Publication.author field, so Django goes to author attribute for each publication which was not previously populated and does one extra query to Author’s table. As in #1 solution is just add prefetch_related to publications queryset.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…