Profiling — Key to python code optimization

Published in

The Good Food Economy

10 min readJan 16, 2023

Today, python has become a key language for high-level as well as a general-purpose programming. From building high-end applications to solving any data science business problem, the use of python as a programming language has expanded over the past couple of years. This involves writing simple or complicated python scripts which may take different times to execute.

The problem comes when a python script takes more than the expected time to execute. What do we do then? Do we just sit and make a fuss about the problem like this guy?

The answer is a big NOOOOO!!!!

Instead, it’s good to have a few questions flow into your mind when your python script takes a long time to execute.

Where is the bottleneck in the script making it take such a long time to execute?
Is there any particular line of code taking time for execution?
Is there any line of code taking higher memory utilization?

Now, once you know the problem to find the solution for, there is one thing to always remember….

Python has got you covered!!

Python has a feature called PROFILING. In short, profiling helps to understand which line of code is taking how much time or memory utilization. Once we have this, we can make use of various visualizations to analyze the results.

In this blog, we will explore the following:

Various packages/modules in python to profile your code. These packages include cProfileV or cProfiler, line-profiler, and memory-profiler
Some interesting visualization tools that we can use with the above python packages for a better understanding of the code profiling.

cProfile

Let’s start with understanding a few lines of code which we will use for the profiling using cProfile.

import random

class initialise:

    def __init__(self, num1, num2):
        """
        This function initialises the parameters required
        """
        self.num1 = num1
        self.num2 = num2

    def generate(self):
        """
        This function generates a random integer between
        given 2 numbers.

        Imput: Integer
        Output: Integer
        """

        a = random.randint(self.num1, self.num2)
        return a

In the above code, I have created a class initialise to initialise 2 numbers. The function within the class generates a random integer between those 2 numbers.

I have created another main.py file with the below lines of code

#lets first download some libraries

import numpy as np
from constants_v2 import initialise

#initialise the numbers

numbers = initialise(10000000,10000000)
numbers1 = initialise(1,10)

#lets call the generate function of the initialise class to get the random integer between above 2 numbers.

num = numbers.generate()

print("The random number generated is {}".format(num))

#Now let's create a function to do some time consuming calculation

def calculate(num):
    cal_list = []
    for i in range(1, num):
        result = (i * (i ** 1))
        cal_list.append(result)
    return cal_list

cal_list1 = calculate(num=num)

#create a list of same length to multiply with the actual list

mul_list = []
for _ in range(0, len(cal_list1)):
    mul_list.append(numbers1.generate())

#convert into array and multiply

arr1 = np.array(cal_list1)
arr2 = np.array(mul_list)

final_list = arr1 * arr2

Once, the code is ready, let’s create a simple profiling of this code using cProfiler. Remember, it is required to install the cProfileV package using the below pip command.

!pip install CProfileV

Now, run the below command in the terminal.

python -m cProfile -s tottime  main.py

The above command tells python to run the cProfile package on the mentioned python file and sort output (-s) by the tottime parameter. It gives the below output. (Pasting a snippet of the initial part of the complete output)

Profiling output of the optimized python code

In short, the above output summarises that the complete time taken to execute the script is 18.386 seconds within which the calculate function takes the max time to execute and so on.

Here is the description of each parameter of the output.

ncalls : Shows the number of calls made
tottime : Total time taken by the given function. Note that the time made in calls to sub-functions is excluded.
percall : Total time / No of calls. (the remainder is left out )
cumtime : Unlike tottime, this includes time spent in this and all subfunctions that the higher-level function calls. It is most useful and accurate for recursive functions.
The percall following cumtime is calculated as the quotient of cumtime divided by primitive calls. The primitive calls include all the calls that were not included through recursion.

Now, since we know the bottleneck of our python script, let’s try and optimize the main script by using a list comprehension rather than using a ‘for’ loop within a function. Let’s name the script as prof_func_v4.py. The code of the same is displayed below:

#lets first download some libraries

import numpy as np
from constants_v2 import initialise

#initialise the numbers

numbers = initialise(10000000,10000000)
numbers1 = initialise(1,10)

#lets call the generate function of the initialise class to get the random integer between above 2 numbers.

num = numbers.generate()

print("The random number generated is {}".format(num))

#let us create the same list using list comprehension.

cal_list = [(i * (i ** 1)) for i in range(1,num)]

#create a list of same length to multiply with the actual list

mul_list = [numbers1.generate() for _ in range(0, len(cal_list))]

#convert into array and multiply

arr1 = np.array(cal_list)
arr2 = np.array(mul_list)

final_list = arr1 * arr2

Now, let’s run our cProfile command on the optimized script to get the output.

python -m cProfile -s tottime  prof_func_v4.py

Let’s inspect the output of this command.

Here, we can see that once we replace the range function with list comprehension, our script takes 15.834 sec which is ~2 seconds lesser than the range function.

Note: There are multiple extensions to save the output file like .out, .pstats, .profile, etc. for suitable visualizations. For now, I am using .pstats extension. You can save the output in the binary file using the below command.

python -m cProfile -o list_comp.pstats -s tottime  prof_func_v4.py

Visualizing the profiling results

Once, we have the profiling statistics stored in our “.pstats” file, python has some cool visualizations available to better analyze the results.

Let’s jump straight to the visualizations.

SnakeViz

SnakeViz is a browser-based graphical viewer for the output of Python’s cProfile module. It provides the users with 2 kinds of visualizations.

1. Icicle visualization

Icicle visualization using SnakeViz for the initial script

Icicle visualization using SnakeViz for optimized script

2. Sunburst visualization

Sunburst visualization using SnakeViz for the initial script

Sunburst visualization using SnakeViz for optimized script

One interesting thing with SnakeViz visualization is that it has a “Call Stack” option which gives you the sequence of all the calls being made as you drill down in the graph. Below is a small gif to give an idea of the complete interface of the visualization.

SnakeViz visualization

Gprof2Dot

There is another package in python that provides hierarchical visualization for easy analysis of the profiling results. Assuming you have gprof2dot, graphviz and pydot installed, will jump straight to the visualization.

First, we will have to create the same “.pstats” output file as earlier. For now, we can use the same file saved for the above visualization. Use the below command to create the visualization and save the image as .png in the current directory.

gprof2dot -f pstats list_comp.pstats | dot -Tpng -o list_comp.png

After this, we can simply view the image by running the below command on the saved PNG file.

open list_comp.png

This command finally gives you the below visualization

Profiling visualisation for optimised script — prof_func_v4.py

Profiling visualization for the initial script

Line Profiler

Since we have seen how you can get the complete profiling using cProfile. We will now use line-profiler to evaluate the python code line-by-line. There is a very slight difference between line profiler and cProfile in terms of the implementation. Also, it uses 2 main utilities:

“@profile” decorator, which indicates the function to be profiled.
“Kernprof” is responsible for executing and recording the evaluation metrics and statistics.

Now, let’s evaluate the same script using the line profiler

Note: We will not focus on optimization this time. The remaining blog will focus more on the way of implementation and visualization.

First, as mentioned, use the “@profile” decorator on each function to be profiled. You can name the main script as l_profiler.py. You can use it as below.

import random

class initialise:

    def __init__(self, num1, num2):
        """
        This function initialises the parameters required
        """
        self.num1 = num1
        self.num2 = num2

    @profile #profile decorator
    def generate(self):
        """
        This function generates a random integer between
        given 2 numbers.

        Imput: Integer
        Output: Integer
        """

        a = random.randint(self.num1, self.num2)
        return a

#lets first download some libraries

import numpy as np
from l_constants import initialise

#initialise the numbers

numbers = initialise(10000000,10000000)
numbers1 = initialise(1,10)

#lets call the generate function of the initialise class to get the random integer between above 2 numbers.

num = numbers.generate()

print("The random number generated is {}".format(num))

#Now let's create a function to do some time consuming calculation
@profile #profile decorator
def calculate(num):
    """
    This function does calculation on some range of numbers and
    appends to the list
    """
    cal_list = []
    for i in range(1, num):
        result = (i * (i ** 1))
        cal_list.append(result)
    return cal_list


#create a list of same length to multiply with the actual list

@profile #profile decorator
def multiply_list():
    """
    This function creates list to multiple with another list by 
    generating random numbers
    """
    #use the function to create list 

    cal_list1 = calculate(num=num)

    #empty list

    mul_list = []
    for _ in range(0, len(cal_list1)):
        mul_list.append(numbers1.generate())

    #convert into array and multiply

    arr1 = np.array(cal_list1)
    arr2 = np.array(mul_list)

    final_list = arr1 * arr2

if __name__ == "__main__":
    multiply_list()

As you can see, I have added the “@profile” decorator to the functions. Once, that is done, run the below command in the terminal to get the results. The output of one of the functions is also mentioned below.

kernprof -l -v l_profiler.py

Line-by-line profiling of the multiply_list function

Here is the description of each output parameter.

Hits: The first column represents how often that line was hit inside that function.
Time: This represents the total time taken by that line for all hits.
Per Hit: The third column represents the time taken per each call in that line.
% Time: This represents % of the time taken by that line of total function time.
Line Contents: This column represents a line of function.

But, this output looks difficult to analyze, right?

Do you remember what I asked you to remember at the beginning of this blog? Yes, Python has got you covered for literally everything.

Python has a module called lineprofilergui. This gives you a GUI version of your profiling and makes it easy to analyze the output. Run the below command in the terminal.

lineprofilergui l_profiler.py

This gives you a cool visualization to read all the profiled functions in one place with the color coding done on the basis of the time taken for each line to execute. Have a look!

Line profiler gui

Memory-Profiler

Memory-profiler is no different from cProfiler and line profiler. The only difference is that rather than profiling the time taken for the code to execute, it profiles the memory utilization.

Like line profiler, it uses 2 main utilities:

“@profile” decorator, which indicates the function to be profiled.
“mprof” is responsible for executing and recording the evaluation metrics and statistics.

First, let’s understand the parameters displayed by the line memory-profiler.

Line #: This is the number of the line of code in the corresponding script.
Memory usage: It is the memory used by that line of code.
Increment: This is the increment in the memory compared to the previous line of code. i.e. Increment at time ‘t’ will be memory usage at the time ‘t’ minus memory usage at the time ‘t-1’.
Occurrences: It is the number of times a particular line occurs in the code.

Now, without wasting much time, let’s jump on how we can profile the memory utilization of the code and visualize it to better understand the output.

You can rename the file to mprof_func.py and use the same code used for line-profiler and run the below command to get the output displayed in the terminal itself.

python -m memory_profiler mprof_func.py

To create a visualization of the profiling, run the below command in the terminal.

mprof run mprof_func.py

This command creates a .dat file in the directory which stores the memory utilization each second. Now, witness a cool visualization as we run the below command in the terminal.

mprof plot -t 'Memory profiler in python'

This command uses the latest .dat file in the directory to create a plot between the memory utilization and the time (in seconds). The visual of the plot looks like this.

As you can see in the above image, the legend of the plot shows 2 functions that have been profiled for memory utilization and the time taken by each function to execute. The colored brackets in the image show the intervals of each function.

The plot looks much more appealing on using — the flame command along with the above commands.

mprof plot -t 'Memory profiler in python' --flame

pheewwwwww!!!!

This brings us to the end of this blog, which was indeed a long but explorative journey. But, one thing that becomes affirmative is that python provides us with enough capabilities when it comes to profiling the python scripts.

Let me know your valuable suggestions in the comment sections!!

Till then, Happy learning ;)

References

Profiling Visualisation: https://medium.com/@narenandu/profiling-and-visualization-tools-in-python-89a46f578989
Detail on Profiling: https://www.machinelearningplus.com/python/cprofile-how-to-profile-your-python-code/#:~:text=The%20syntax%20is%20cProfile.,passed%20to%20the%20filename%20argument
Memory-profiler and visualization: https://coderzcolumn.com/tutorials/python/how-to-profile-memory-usage-in-python-using-memory-profiler