Python Profiling

7 min readApr 15, 2019

How to profile a Python application to figure out what functions you should optimize.

What is profiling?

The first step toward a fast implementation of a program is profiling to find out where your code spends its time. The key point is to only optimize functions that are actually slow or are called extensively. Profilers can collect several types of information: timing, function calls, interruptions, cache faults… In our case, we are going to focus our attention on timing and function calls.

Deterministic vs Statistical

There are two types of profiling.

Deterministic Profiling: All events are monitored. It provides accurate information but has a big impact on performance (overhead). It means the code run slower under profiling. Its use in production systems is often impractical. This type of profiling is suitable for small functions.
Statistical profiling: Sampling the execution state at regular intervals to compute indicators. This method is less accurate, but it also reduces the overhead.

Built-In Python Profilers

Python comes with two modules for deterministic profiling: cProfile and profile. Both are different implementations of the same interface. The former is a C extension with relatively small overhead, and the latter is a pure Python module. As the official documentation says, the module profile would be suitable when we want to extend the profiler in some way. Otherwise, cProfile is preferred for long-running programs. Unfortunately, there is no built-in module for statistical profiling, but we will see some external packages for it.

How to use cProfile

For the sake of example, we are going to write a small Python script like this:

dummy example code

The function computation is a naive implementation for the sum of two lists. Here, we use two lists of 1000000 elements. The rest of the code is just a preparatory for the next section.

Let’s find out where our program spends the most time (pretty obvious, right?).

➜ python3 -m cProfile script.py58 function calls in 9.419 secondsOrdered by: standard namencalls  tottime  percall  cumtime  percall filename:lineno(function)
   1    0.000    0.000    9.419    9.419 part1.py:1(<module>)
   51   9.419    0.185    9.419    0.185 part1.py:1(computation)
   1    0.000    0.000    9.419    9.419 part1.py:10(function1)
   1    0.000    0.000    9.243    9.243 part1.py:15(function2)
   1    0.000    0.000    0.176    0.176 part1.py:20(function3)
   1    0.000    0.000    9.419    9.419 part1.py:24(main)

As we can see, the profiler has recorded information for each function of our code. How can we interpret the results?

ncalls: As the name suggests, the number of calls. We should try to optimize functions that have a lot of calls or consume too much time per call.
tottime: The total time spent in the function itself, excluding sub calls. This is where we should look closely at. We can see that the function computation is called 51 times, and each time consumes 0.185s.
cumtime: Cumulative time. It includes sub calls, that’s why function1 and function2 have a cumulative time similar to the total time of computation.
percall: We have two “per call” metrics. The first one: total time per call, and the second one: cumulative time per call. Again, we should focus on the total time metric.

We can also sort the functions by some criteria. For example:

➜ python3 -m cProfile -s tottime script.py58 function calls in 9.172 seconds
Ordered by: internal timencalls  tottime  percall  cumtime  percall filename:lineno(function)
   51   9.104    0.179    9.104    0.179 part1.py:1(computation)
   1    0.000    0.000    8.927    8.927 part1.py:15(function2)
   1    0.000    0.000    9.104    9.104 part1.py:10(function1)
   1    0.000    0.000    9.104    9.104 part1.py:1(<module>)
   1    0.000    0.000    0.178    0.178 part1.py:20(function3)
   1    0.000    0.000    9.104    9.104 part1.py:24(main)

The full list of ordering keys can be found here.

We can use cProfile as a module rather than as a script to run more complex analyses. The most basic use of this module is the following:

import cProfile
cProfile.run('main()')

We can pass any Python expression as a string, and the profiler will record information. For further knowledge, I suggest you read the official documentation here.

Generating a call graph

“A picture is worth a thousand words”.

Sometimes, especially when we have a large program, we may want to see how our functions are called each other. With gprof2dot we can generate a dot graph from the output of our profiler. This tool has support for many well-known profilers, including cProfile/profile. After following the official installation guidelines, we can generate a graph of our program:

First of all, we need to tell the profiler to dump the collected data into a file:

python3 -m cProfile -o output.pstats script.py

2. Secondly, we run gprof2dot to generate a png image:

gprof2dot -f pstats output.pstats | dot -Tpng -o output.png

This is the result:

As we can see, the graph shows both metrics, the total and cumulative time, and the number of calls.

Statistical Profiling: Py-Spy and pyinstrument

There are many statistical profilers for Python, but I would like to talk about two of the most famous: Py-Spy and pyinstrument.

Py-Spy is a statistical (sampling) profiler that lets you visualize the time each function consumes during the execution. An important feature is that you can attach the profiler without restarting the program or modifying the code, and has a low overhead. This makes the tool highly suitable for production code.

To install it, just type:

pip install py-spy

Using this tool is straightforward; you just have to type this:

sudo py-spy — python3 script.py

As I mentioned before, we can attach the profiler in runtime. In that case, we would use the PID of the process.

sudo py-spy --pid PID

After that, we will see something like this:

As we see, the profiler is sampling periodically and updating the information.

The second tool I would like to talk about is PyInstrument. It is similar to cProfile in the sense that we can’t attach the profiler to a running program, but the output is far more attractive. It can also generate an HTML report.

To install it, just type:

pip install pyinstrument

Now, we are going to generate an HTML report for our code:

pyinstrument -r html script.py

We can open the generated HTML file with our default browser:

As you can see, the output of this tool is easy to interpret and distraction-free. This is because the main purpose of it is to show the slowest functions of your code, that’s all. When we have tons of functions, this is a huge advantage.

Other tools

As I said before, there are many other tools for profiling. Feel free to use the most suitable tool for you. Here I list you some of them:

vprof: Visual profiler
Pyflame: A Ptracing Profiler
plop: Low-Overhead Profiler
StackImpact: Production-Grade Performance Profiler: CPU, memory allocations, blocking calls, exceptions, metrics, and more.
What-Studio Profiling: An interactive continuous profiler.

Real Example

So far we were dealing with a dummy example. Now, we are going to use a more complex one. I created a repository in GitHub where I will put all the code along with this post series. The code for this example is the following:

a more complex example

This is a basic example of an optimization problem using Local Search. The idea is to find the vector of feature importances for a 1-NN classifier that maximizes reduction while maintaining high accuracy.

Let’s find out the slowest function of our code. This code is not for production and the maximum execution time in my computer is 5–10s. Also, the length is not excessive, so I will try cProfile.

The list is incomplete. One of the biggest drawbacks of cProfile is that it prints the internal functions of the libraries you use, making the output really messy. In most of the cases, we cannot optimize code from external libraries.

For that reason, if you use external libraries extensively I would recommend you use another tool such as pyinstrument. Having said this, let’s try again:

pyinstrument -r html local_search.py

Now we have the output we were looking for, without messy lines, just the slowest functions of our code.

Summary

Thank you for having come this far. Now I would like to remind you of the key points of this article:

Profiling is the first step to optimize your code.
Deterministic profiling implies higher accuracy and higher overhead.
It is recommended to visualize graphically the function calls.
There are several profilers for Python and you have to choose the correct tool for each situation.

The profiling tools you should use vary with the code you are working on. If you are writing a single algorithm or a small program, e.g. K-Means, you should use a simple profiler like cProfile or even a fine-grained tool like line_profiler. In contrast, when you are optimizing a whole program, you may want to use a statistical profiler to avoid overhead. As a rule of thumb: