How to Speed up Python Code with CPU Profiling

Ellen Schellekens
Ixor
Published in
5 min readJul 20, 2022
Dall-e mega assisted impression of fast python code

Optimising your code’s running time has only advantages. For instance if your code runs in the cloud on a pay-for-use solution, such as AWS Lambda, you will see an immediate cost reduction if you decrease the running time. And if your code runs in a time sensitive environment, or on a device with limited resources, optimising the running time is crucial.

But in order to do so, you must first analyse which parts of the code use the most time to start with. You can spend as much time as you want to perfect the running time of a specific piece of code, if that code only took a small part of the total running time to begin with, you will barely notice improvements. This is also known as Amdahl’s law.

But how to make this analysis? That is where profilers come in! In this article, we’ll discuss a few different type of python profilers, namely deterministic profilers, probabilistic profilers and line profilers.

Deterministic Profiling

Deterministic profiling is a way to get a general overview that keeps track of all the calls that happen in the run, and how long they take. This way, they can provide a detailed overview of everything that happens. However, the downside to this is that there is considerable overhead, and the result is so detailed it might be hard to make head or tail of it.

You can do deterministic profiling with a build-in library of python: cProfile. The library can be used in a number of ways: you can directly run a piece of code, you can use it as context manager or as a class. Another approach, and the approach that we will use, is to use the command line interface. Since the direct result is very cluttered, we will format the results in a pstats file. The following command does this:

python -m cProfile -o output_filename.pstats path/to/script arg1 arg2

There are various libraries that convert the pstats file to a graph, we use a library called gprof2dot:

gprof2dot -f pstats myProfileFile | dot -Tpng -o image_output.png

The result looks something like this:

This might seem quite overwhelming, but big parts of the graph are initialisation or low level functions. By focusing on the root node (one of the red ones), you can follow the graph to find the biggest shares of the time distribution. In this case, we see that the running the AI model takes up 83% of the time, which is to be expected. Furthermore, the postprocessing of our model takes almost 14% of the time, while preprocessing only takes 1,3%! It’s clear we should take a closer look at the postprocessing for speed optimisations.

Probabilistic Profiling

The deterministic profiling is all good and fine, but what if your code is too big, and the profiling requires too much overhead to perform? Fear not, my friend! This is where probabilistic profiling comes in. Instead of keeping track of every single call, this type of profiling checks the call stack at regular intervals to see which function is performing. This way, it creates an overview of the total time distribution. This kind of ‘checking at regular intervals’ is also known as Monte Carlo sampling.

The build-in profiling package cProfile does not offer probabilistic profiling at the time of writing, but there are numerous other packages that offer this. We used the package PyInstrument. It allows to export the results to different formats, among which an interactive html page. In terms of user-friendliness, this is definitely a big improvement compared to the previous approach! If you use the html output format, the result will look something like this:

The main conclusions are the same as with the deterministic profiling, but the exact time estimates do vary quite a bit. For example, the probabilistic profiler estimates the time spent in the model itself as 62%, compared to the 83% estimate of the deterministic model. But for the purpose of identifying slow parts of the code, the exact percentages don’t matter.

Line Profiling

In some cases, the above measures already give enough pointers to where the slow parts of the code are. But in other cases, you might have narrowed it down to a specific function that is quite long or complex. In this case, you can use line profilers to profile how much time is spent in each line of the function.

To do this, we use the line_profiler package available at https://github.com/pyutils/line_profiler. To use it, just add the decorator @profile above the function(s) you want to profile. The package will print out the number of hits (times the function is executed), the total time, the average time per hit, the time percentages and the content of the line:

In this case, it’s quite clear that the compiling of the regexes took way too much time, and be combining multiple regexes in 1 more complex regex, we could save a considerable amount of time.

Conclusion

With the tools discussed in this article, you can identify the slow parts of your Python code in order to accelerate them. You can follow these steps:

  1. Analyse the whole module to identify bottleneck functions. You can either use deterministic profiling (cProfiler) or probabilistic profiler (Pyinstrument)
  2. Use line profilers when necessary to identify specific problematic lines
  3. Now you know which parts to speed up!

The advantage of deterministic profilers is that they are precise and count all function calls. But compared to probabilistic profilers, they have way more overhead and are often less user-friendly to work with.

At IxorThink, the machine learning practice of Ixor, we are constantly trying to improve our methods to create state-of-the-art solutions. As a software company, we can provide stable products from proof-of-concept to deployment. Feel free to contact us for more information.

--

--