5 Python profiling tools for performance analysis

Saurav Paul
6 min readJul 25, 2023

--

Photo by Sigmund on Unsplash

Python being an interpreted language has all the benefits of fast development and is said to be a “language with batteries included” but the one point where it lacks is it’s performance from both CPU resources consumption perspective as well as speed of execution of a task.

As a matter of fact, python stands 2nd from bottom in case of energy consumption of a language.

I had once at my job, task to improve the efficiency of python application running as daemon service in Linux. Whenever you get a task to improve performance of an existing application, the first step that you have to do is profiling. I will discuss some of the profiling tools that I have found really helpful during development and performance bench-marking. Following are 5 such tools which you can use for Python code.

1) timeit module
The timeit module in Python is used to measure the execution time of small bits of Python code. It has both a Command-Line Interface as well as a callable one. In short, it can be used to measure how long it takes for a piece of Python code to run. This can be useful for comparing the performance of different algorithms and for optimizing code. It helps in taking a decision when you have multiple approach for a small function and want to decide which approach is faster.

The usage of timeit modules requires you to pass setup string of python which will be run once and the statement string which can be run N number of times to gauge the avg running time.

The below example will time the bubble sort performance for 1000 times and provide avg time taken.

import timeit
if __name__ == "__main__":
setup = """
import random
numbers = [random.randint(0, 10000) for _ in range(100)]
def bubble_sort(numbers):
for i in range(len(numbers) - 1):
for j in range(len(numbers) - i - 1):
if numbers[j] > numbers[j + 1]:
numbers[j], numbers[j + 1] = numbers[j + 1], numbers[j]
"""
stmt = "bubble_sort(numbers)"
print(timeit.timeit(stmt, setup, number=1000))

on running the above snippet will give an output as below

python3 test.py 
0.19682375000411412
Process finished with exit code 0

You may run the similar test for quick sort and observe how the average time will be much faster. The drawback with timeit module is the tedious setup of ‘setup’ and ‘stmt’ string with python code for running your method in test. For real production code, there may be too many imports/setups required to test your method in question.

2) cProfile:

The cProfile module in Python is a powerful profiling tool that can be used to identify bottlenecks in code and to optimize code.

It can be used to measure how long different parts of a Python program take to execute. This can be useful for finding out which parts of a program are taking the most time to execute and for optimizing those parts of the program.

cprofile module can be used in various ways within the code or from command line on a program. The cProfile.run() can be used by passing the string input python code that needs to be profiled.

For example, the following code will profile the quicksort() function:

import cProfile
def quicksort(numbers, low, high):
if low < high:
pivot = partition(numbers, low, high)
quicksort(numbers, low, pivot - 1)
quicksort(numbers, pivot + 1, high)
def partition(numbers, low, high):
pivot = numbers[high]
i = low - 1
for j in range(low, high):
if numbers[j] <= pivot:
i = i + 1
numbers[i], numbers[j] = numbers[j], numbers[i]
numbers[i + 1], numbers[high] = numbers[high], numbers[i + 1]
return i + 1
if __name__ == "__main__":
cProfile.run('quicksort([10, 80, 30, 90, 40], 0, 4)')

This code will profile the quicksort() function and print a report of the profiling results. The report will show you how long each part of the quicksort() function took to execute.

The cProfile module can also be used with stats class where the cProfile captures the profiling data and dumps it as a stats object. So the above example main can also be modified as below to dump stats

Another way to use it is via the Profile object and stats class of cProfile as shown below

if __name__ == "__main__":
#cProfile.run('quicksort([10, 80, 30, 90, 40], 0, 4)')
with cProfile.Profile() as pr:
# … do something …
quicksort([10, 80, 30, 90, 40], 0, 4)
pr.print_stats()

The command line way to run cProfile on your python file (if it is a single python file) is as below

python -m cProfile -o program.prof my_program.py

The issue with cProfile is that it is too chatty for a production grade code as it samples to base minimum call possible which populates the overall report with base modules or language code where we don’t have control. Also to visualize such hugs stats file you will have to use a tool like snakeviz to visualize the data.

3. PyInstrument

PyInstrument is an external package that offers statistical profiling. It takes samples of the running program at regular intervals and estimates the time spent in each function based on these samples. It uses the sys.setprofile function to achieve this. PyInstrument is easy to set up and use but may introduce a slight overhead due to its sampling nature.

It can be used directly from command line by invoking the script directly with Pyinstrument and it will output a colored summary showing where most of the time was spent.

The choice between cProfile and PyInstrument depends on your specific needs. If you need precise function call information and minimal overhead, cProfile is a good choice. If you want an easy-to-use tool for quick performance insights and are willing to tolerate some estimation error, PyInstrument can be a handy option.

from pyinstrument import Profiler
profiler = Profiler()
profiler.start()
# code you want to profile
profiler.stop()
profiler.print()

4. py-spy

Py-spy is a sampling profile that

py-spy is a sampling profiler that lets you visualize what your Python program is spending time on without restarting the program or modifying the code in any way. Py-spy is extremely low overhead as it is written in Rust for speed and doesn’t run in the same process as the profiled Python program. This means py-spy is safe to use against production Python code.

py-spy works from the command line and takes either the PID of the program or the command line of the python program you want to run. It has three sub-commands record, top and dump. One advantage of py-spy is that it can profile a multi-threaded python process as well by passing the option -s

The record option outputs a SVG file with a flame graph of where most of time spent during sampling.

py-spy record -o profile.svg --pid 12345
# OR
py-spy record -o profile.svg -- python myprogram.py

The above command will generate a flame graph like below

5) Pyflame:

Pyflame is the only Python profiler based on the Linux ptrace system call and generates a flame graph. Ptrace implementation allows it to take snapshots of the Python call stack without explicit instrumentation, meaning you can profile a program without modifying its source code. It also fully supports profiling multi-threaded Python programs.

Pyflame is suitable for profiling live process in production as it written in C++ and has less overhead then the builtin cProfile module. It also provides a richer profiling data.

It can be used along with flamegraph.pl script to generate a svg flamechart just like in py-spy.

# Generate flame graph for pid 12345; assumes flamegraph.pl is in your $PATH.
pyflame -p 12345 | flamegraph.pl > myprofile.svg

Some of the notable mentions are memory-profiler for monitoring the memory usage and Pycharm line-profiler for doing profile from within pycharm IDE. Last but not least the linux TOP command is always there.

These are tools I wanted to share that I hope will be helpful in your performance improvement journey.

--

--