Write Fast, Efficient, and Production-Ready PyTorch Deep Learning Models (Part 2)

Axen Georget

Published in

PhysicsX

11 min readFeb 5, 2024

Part 1 / Software Engineering: Essential Concepts
Part 2 / Python: Fundamentals and Optimisation
◦ Code Quality
◦ Automated Testing with Pytest
◦ Code Structure
◦ Memory Management
◦ A Note on Performances
◦ Leverage Faster Programming Languages
◦ Data Structures in Python
Part 3 / GPGPU: The Theory
Part 4 / PyTorch: Production-Ready Deep Learning

Part 2 / Python: Fundamentals and Optimisation

Python is one of the most popular programming languages. It has an easy learning curve and comes with a great community that has built many open-source libraries over the years.

Specifically, Data Scientists are increasingly adopting Python instead of R thanks to state-of-the-art libraries like SciPy, statsmodels, sklearn, etc. As such Python became a cornerstone of current machine learning applications. Many believe that Python is one of the main factors that pushed the fast research and recent development of artificial intelligence, thanks to libraries like TensorFlow or PyTorch.

Python is great for many things, but unfortunately, it has quite a slow execution speed. As it is an interpreted language, it does not benefit from the optimisation a compiler can bring.

Thankfully, Python can also run compiled code (from other programming languages). Thus, most machine learning libraries implement their algorithms in C or C++ along with Python bindings. Allowing to get the best of both worlds: the efficiency and performance of a compiled language and an easy-to-use Python API.

In this article, we will explore good practices when it comes to writing Python code focusing on increasing performance. Going from the fundamentals to more advanced topics to optimise your code.

Code Quality

As explained in the previous article, writing clear code is fundamental when it comes to maintaining software and collaborating smoothly. Python syntax forces the code to be indented, making it easy to read out of the box. This is a great feature, but it is not enough to make sure a code is easy to read and understand.

Fortunately, a lot of tools exist to help improve Python code quality. As described previously, it is important to unify a codebase with a common coding style. The Black library has been created for this very purpose. It is an uncompromising Python code formatter which can automatically format your Python code, ensuring that a codebase follows a specific and unified format.

# Example of code before Black
# -------------------------------------------------------------------------

def my_func(
    arg_1: int,      arg_2 : int) -> int:
    result  = arg_1  + arg_2
    return result

# Example of code after Black
# -------------------------------------------------------------------------

def my_func(arg_1: int, arg_2: int) -> int:
    result = arg_1 + arg_2
    return result

To go further, a more complete and faster library has been on the rise recently: Ruff. It provides formatting features, just like Black, but it also includes linting features. Improving code quality further. If you are not already using any formatting tool, Ruff is highly recommended.

Finally, an often neglected aspect when it comes to Python code quality is type hinting. Python does not feature static typing checks, this makes it simple to write code, but it also increases the risk of bugs and unwanted behaviours. To fix that, the mypy library can be used. It is a third-party static type checker for Python, and configured correctly it can enforce typing on an entire codebase. When writing product-ready code, using mypy (or any other static type checker) with a strict configuration is imperative.

# Example of typed Python class
# -------------------------------------------------------------------------

class MyClass:
    def __init__(self, arg_1: int, arg_2: str, arg_3: list[int]) -> None:
        self._arg_1 = arg_1
        self._arg_2 = arg_2
        self._arg_3 = arg_3
        
    def get_arg_3(self) -> list[int]:
        return self._arg_3

Automated Testing with Pytest

Writing automated tests in Python is fairly easy. Using the Pytest library developers can write unit tests with features like mocking, parametrized tests, etc. As advised before, writing tests is important to make sure a code is bug-free and production-ready.

When writing unit tests with Pytest it is advised to still follow previously described good practices like modularity and reusability. You can use fixtures to make sure that your test code is reusable. And using the pytest-cov library you can measure the code coverage of the written tests.

Properly testing deep learning models can sometimes be tricky, especially when functions have random components. One thing that is helpful in that case is to set a fixed random seed in your tests, ensuring that the outputs will always be the same. You can also initialise your model weights with fixed values when testing deep learning layers, making the outputs more predictable.

Code Structure

Perhaps the most important part of making a codebase clear and maintainable is its organisation and structure. There are many ways to structure a Python codebase, but the following practices should be used to make sure it is well organised.

To begin with, separate your functions and classes into different files. The idea is to group functions and classes together when it makes sense to, but also to avoid having too much code per file. This is important to make sure the codebase is easy to navigate.

On bigger projects, it is advised to group files into different modules, modules can also have submodules when it makes sense. The point is to improve modularity, reusability, and clarity. Modules can import each other, but note that cycling dependencies should be avoided. When confronted with such problems (cycling dependencies), it is often a sign that the code is not well organised and that some refactoring is needed.

Finally, when it comes to the tests, two main ways are used to structure them: some codebases use a tests folder at the root of a project, and others create a tests folder for each module of the project. It is recommended to create a single tests folder at the root. Indeed when packaging your code you will most likely want to exclude the tests, which is harder to do when you have multiple test folders all over your project.

Here is an example showing how you can structure your project, including tests:

project_folder
├── tests
│   ├── module_1
│   │  └── test_something.py
│   └── module_2
│       ├── submodule_1
│       │   └── test_something.py
│       └── submodule_2
│           └── test_something.py
└── project_name
    ├── __init__.py
    ├── file.py
    ├── module_1
    │   ├── __init__.py
    │   └── file.py
    └── module_2
        ├── __init__.py
        ├── file.py
        ├── submodule_1
        │   ├── __init__.py
        │   └── file.py
        └── submodule_2
             ├── __init__.py
             └── file.py

Memory Management

Python is highly abstracting memory management from programmers. When creating classes there is no need for constructors and destructors, everything is automatic and simple to use. This greatly reduces the complexity of writing Python code, and it also reduces a lot of potential bugs related to memory usage. However, it remains important to understand how it works to make sure your code is properly optimised.

First of all, since Python is an interpreted programming language, there is no concept of static and dynamic memory allocation. Everything is dynamically allocated at runtime. When instantiating objects, Python will automatically store a pointer to the allocated object. But this is not the case for simple types like integer, floating point, etc. This means that passing an object to a function does not copy the object, but passing an integer to a function will copy the value.

# Example showing how an object behaves
# -------------------------------------------------------------------------

# Define class
class TestClass:
    def __init__(self, number: int) -> None:
        self.number = number

# Create one instance of the class
test_object = TestClass(number=1)
print(test_object.number)

# Output: 1
# -------------------------------------------------------------------------

# Function using the instance's reference
def function_1(test: TestClass) -> None:
    test.number += 1

function_1(test=test_object)
print(test_object.number)

# Output: 2

# We can see here that the instance is modified in-place
# -------------------------------------------------------------------------

# Function overriding the instance reference
def function_2(test: TestClass) -> None:
    test = TestClass(number=3)

function_2(test=test_object)
print(test_object.number)

# Output: 2

# Here, overriding the reference does not modify the original instance
# -------------------------------------------------------------------------

# Example showing how a simple value behaves
# -------------------------------------------------------------------------

# Define value
test_value = 1
print(test_value)

# Output: 1
# -------------------------------------------------------------------------

# Simple function taking a copy of the value
def function(value: int) -> None:
    value += 1

function(value=test_value)
print(test_value)

# Output: 1
# -------------------------------------------------------------------------

In traditional programming languages where memory is managed by programmers, it is necessary to allocate and deallocate memory manually for everything that has a dynamic size. For instance, when creating an array, if its size is unknown at compilation time, it will be necessary for the software to dynamically allocate memory with enough space to contain the array. Then, when this array is not used anymore, the software has to deallocate the memory to free some space and avoid memory leaks.

In Python, allocating and freeing memory is done automatically. Automatically allocating memory is easy, but freeing memory is a harder task. To do so Python implements what is called a garbage collector.

The garbage collector keeps a reference count for each allocated variable. When this reference count gets to zero it means that the garbage collector can free the memory. This is good but sometimes objects can have reference to themselves making them impossible to reach a zero reference count. Python uses a generational garbage collection mechanism to avoid these issues.

Knowing this, you can make sure your Python code is optimised. Mainly, always make sure that a variable is not referenced when you do not need it anymore. Good practices can help in that matter: if you make sure to break down your code into small functions it will force you to carefully think about the inputs and outputs, avoiding keeping references to unused variables.

# Example showing how to free memory in Python
# -------------------------------------------------------------------------

big_array = [0] * 10000

# This takes approximatively 32 * 10000 bits in memory
# To free this memory you can remove the reference to this array like this:
big_array = None

A Note on Performances

Python is slow, this is not a secret and is actually one of its main flaws. When writing very efficient and fast code, using Python is probably not the best choice. However, when it comes to machine learning, it is difficult and time-consuming to re-implement all the useful libraries that exist in the Python ecosystem. Fortunately, these libraries are very optimised and leverage faster programming languages (subject of the next chapter), going around Python’s performance issues.

Being aware of what is slow in Python is essential to use these optimised libraries efficiently. When writing Python code you should avoid the following:

For-loops: writing for-loops in Python is very inefficient, whenever possible try to avoid them. Especially when creating a list, instead of appending in a for-loop prefer using list comprehension.
Unnecessary copies: avoid copying your data structures and objects when it is not absolutely necessary.
Global variables: without mentioning that using global variables is a bad practice with any programming language, they are also slower than normal variables in Python and, as such should not be used.
Concatenate strings with +: using the + operator on strings is not efficient, prefer using the join() method instead. The + operator creates new copies for each concatenation, while join() is fully optimized.

# Comparison of time taken to create a list
# -------------------------------------------------------------------------

# List creation using `append` in a loop
%%timeit -n 100

big_array = []
for array_index in range(100000):
    big_array.append(array_index)

# Output: 2.96 ms ± 73.2 µs per loop
#     (mean ± std. dev. of 7 runs, 100 loops each)
# -------------------------------------------------------------------------

# List creation using list comprehension
%%timeit -n 100

big_array = [array_index for array_index in range(100000)]

# Output: 1.51 ms ± 153 µs per loop
#     (mean ± std. dev. of 7 runs, 100 loops each)

# List comprehension is much faster than the for-loop
# -------------------------------------------------------------------------

# Comparison of time taken to create a string
# -------------------------------------------------------------------------

# String creation using the `+` operator
%%timeit

elements_to_add = ["H", "e", "l", "l", "o", " ", "W", "o", "r", "l", "d"]

my_string = ""
for element in elements_to_add:
    my_string += element
    
# Output: 317 ns ± 4.08 ns per loop
#     (mean ± std. dev. of 7 runs, 1,000,000 loops each)
# -------------------------------------------------------------------------

# String creationg using the `join` function
%%timeit

elements_to_add = ["H", "e", "l", "l", "o", " ", "W", "o", "r", "l", "d"]

my_string = "".join(elements_to_add)

# Output: 107 ns ± 3.76 ns per loop
#     (mean ± std. dev. of 7 runs, 10,000,000 loops each)

# Using the `join` function is 3 times faster than using the `+` operator
# -------------------------------------------------------------------------

Leverage Faster Programming Languages

As mentioned previously a lot of very powerful open source libraries are available for any Python programmer. Most of these optimised libraries use compiled C or C++ code while providing a Python API to interact with it. It is thus a good practice to use these libraries whenever possible.

For instance, when working on machine learning projects it is very usual to write some data processing code. Either translating from one format to another, cleaning, or even just inspecting data. Performing these operations should be done using optimised libraries instead of vanilla Python for-loops (or any slow Python code). For instance, numpy is a widely used and optimised library that can help speed up many operations.

# Comparison of time taken to operate on an array
# -------------------------------------------------------------------------

import numpy as np

# -------------------------------------------------------------------------

# Multilpy by two each element of an array using a for loop
%%timeit -n 100

big_array = np.arange(10000)

for array_index in range(len(big_array)):
    big_array[array_index] *= 2

# Output: 985 µs ± 138 µs per loop
#     (mean ± std. dev. of 7 runs, 100 loops each)
# -------------------------------------------------------------------------

# Multilpy by two each element of an array using numpy
%%timeit -n 100

big_array = np.arange(10000)
big_array *= 2

# Output: 7.9 µs ± 1.54 µs per loop
#     (mean ± std. dev. of 7 runs, 100 loops each)

# numpy is more than 100 times faster than a Python for-loop
# -------------------------------------------------------------------------

Data Structures in Python

When exploring the Software Engineering essential concepts we mentioned that knowing the data structures is fundamental. Luckily, Python also implements the most common data structures directly out of the box. Here is a list of the most useful ones:

Lists: implements continuous variable-length arrays. Making the indexing cost independent of the size.
Deques: implements doubled-linked lists, allowing costless appending and poping at the front or at the tail of the deque. Can be used as a Stack or a Queue.
Sets: implements resizable Hash Sets, allows storing unique values and a costless lookup.
Dictionaries: just like Sets, uses resizable Hash Maps, which allows the storage of key/values with a costless lookup.

Always carefully pick which data structure you want to use depending on the problem you are solving. It can have a big impact on both speed and memory usage. Python provides a lot of abstractions to easily operate on these data structures, but you should not blindly use them as some implementations can be very slow.

For instance, the in keyword can be used to search in a list or a set. While looking in a set is a costless operation, using this keyword to search in a list is a heavy operation.

# Comparison of time taken to search in a list and in a set
# -------------------------------------------------------------------------

# Create a list and a set

character_list = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
character_set = {"a", "b", "c", "d", "e", "f", "g", "h", "i"}

# -------------------------------------------------------------------------

# Search in the list
%%timeit

is_letter_in_list = "e" in character_list

# Output: 44.7 ns ± 0.266 ns per loop
#     (mean ± std. dev. of 7 runs, 10,000,000 loops each)
# -------------------------------------------------------------------------

# Search in the set
%%timeit

is_letter_in_set = "e" in character_set

# Output: 15.6 ns ± 0.103 ns per loop
#     (mean ± std. dev. of 7 runs, 100,000,000 loops each)

# Searching in a set is significantly faster than searching in a list
# -------------------------------------------------------------------------