Python Typing: Enhancing Code Clarity and Reliability

Published in

Data And Beyond

8 min readDec 23, 2023

Python’s dynamism has long been both its strength and a potential pitfall, especially in larger codebases. With the introduction of type annotations in Python 3.5 and the typing module, developers gained a powerful tool to enhance code clarity and reliability without sacrificing Python's flexibility.

What are Type Annotations?

Type annotations in Python allow developers to hint at the expected types of variables, function arguments, and return values within their code. This practice enhances code readability and makes it easier to understand the intended use of different components.

This is especially useful when breaking yourself into packages or modules with chains of function calls and there are many arguments / variables.

# Annotation to a variable with a value.
length: int = 4

# Annotation to a variable without assigning a value is also acceptable.
length1 : float

# Function without Type Annotations
def add_numbers(a, b):
    return a + b

result = add_numbers(5, 10)
print(result)

# Function with Type Annotations
def add_numbers(a: int, b: int) -> int:
    return a + b

result = add_numbers(5, 10)
print(result)

Seems pretty simple and unnecessary when the code is simple. However, when the code gets more complex, it is indeed useful in better understanding the code. Code snippet below is mainly to illustrate the types and complexity only.

from typing import Tuple

def process_data(spark_df: 'pyspark.sql.DataFrame',
                 pandas_df: 'pd.DataFrame',
                 col_name: str = 'generic',
                 threshold: float = 1.0,
                 option: dict,
                 num_iterations: int = 2,
                 verbose: bool = True) -> Tuple[str, bool, 'pyspark.sql.DataFrame']:
    """
    Process data using PySpark DataFrame and Pandas DataFrame.
    Args:
    - spark_df: A PySpark DataFrame.
    - pandas_df: A Pandas DataFrame.
    - col_name: Name of the column to perform operations on.
    - threshold: Threshold value for data processing.
    - option: Additional options for data processing.
    - num_iterations: Number of iterations for the process.
    - verbose: Verbosity flag for logging.
    Returns:
    - A tuple containing:
        - A string message indicating processing completion.
        - A boolean flag indicating success.
        - Resultant PySpark DataFrame.
    """
    # Perform operations on the PySpark DataFrame
    processed_spark_df = spark_df.filter(spark_df[col_name] > threshold)
    
    # Perform operations on the Pandas DataFrame
    processed_pandas_df = pandas_df[pandas_df[col_name] > threshold]
    
    # Additional operations based on the options provided
    if 'scaling' in option and option['scaling']:
        # Perform scaling operations
        pass  # Placeholder for scaling operations
    
    # Additional iterations as per num_iterations
    for _ in range(num_iterations):
        # Perform iterative operations
        pass  # Placeholder for iterative operations
    
    if verbose:
        # Print verbose information
        verbose_str = "Processing complete."
        print(verbose_str)
    
    return (verbose_str, True, processed_spark_df)

Python itself does not enforce type annotations by default. The interpreter won’t raise errors or stop execution if the code doesn’t follow the specified type hints. However, you can use external tools and libraries to enforce type annotations and perform static type checking. Below is an example of Python not enforcing the type

# Function with WRONG Type Annotations will still work.
def add_numbers3(a: str, b: str) -> float:
    return a + b

result = add_numbers3(5, 10)
print(result)
print(add_numbers3.__annotations__)

Different Examples of Type Annotations

Before going into the different type annotations, lets study the typing package available here.

The different available classes can be found from here.

You may also refer to documentation for mypy (which is a static type checker for Python) here. Mypy helps to check using the type hints, whether you are using your variables and functions in your code correctly. It finds bugs without running them

Standard Built-In Types

Below are some basic type hints.

a: str = 'some string'
a: int = 1
a: float = 2.0
a: bool = False

For Python 3.9 and above, the name of the collection type is not capitalized.

a: list[str] = ['a']
a: list[str | int] = ['a', 'b', 3, 4]  # <-- Python 3.10 and above.
a: set[float] = {1.0, 2.0, 3.0}
a: dict[str, str] = {'test': 'ripalo'}
a: tuple[int, str, bool] = (3, 'test', False)
a: tuple[str, ...] = ('a', 'b', 'c', 'd')

For Python 3.8 and earlier, the name of the collection type is capitalized. Also, you will need to import the typing module. Personally, I just import the typing module as a habit no matter which version of Python I am using.

from typing import List, Set, Dict, Tuple, Union
a: List[str] = ['a']
a: List[Union[str, int]] = ['a', 'b', 3, 4]  # <-- Python 3.9 and earlier
a: Set[float] = {1.0, 2.0, 3.0}
a: Dict[str, str] = {'test': 'ripalo'}
a: Tuple[int, str, bool] = (3, 'test', False)
a: Tuple[str, ...] = ('a', 'b', 'c', 'd')

Optional Type is used when your function returns either a None or the type you define.

from typing import Optional, Union
a: Optional[int] = 3  # It could also be None
a: Optional[str] = 'test' # It could also be None

Type Aliases

We can define our own aliases for different types to make it easier for our own understanding.

from typing import List

Vector = List[float]

def scale(scalar: float, vector: Vector) -> Vector:
    return [scalar * num for num in vector]

Callable Type Hint

Callable type hint is used to annotate objects that are callable such as functions or methods. You can find a more detailed elaboration on callabel objects here. Below is an example of how you can use this type hint.

from typing import Callable

def add(a: int, b: int) -> int:
    return a + b

def multiply(x: int, y: int) -> int:
    return x * y

# A function that takes two integers and a Callable as arguments
def perform_operation(x: int, y: int, operation: Callable[[int, int], int]) -> int:
    return operation(x, y)

print(add.__annotations__)
print(multiply.__annotations__)
print(perform_operation.__annotations__)

result1 = perform_operation(5, 3, add)  # Calling perform_operation with 'add'
print("Result of addition:", result1)

result2 = perform_operation(4, 2, multiply)  # Calling perform_operation with 'multiply'
print("Result of multiplication:", result2)

Classes

We have pretty much covered a number of examples on how to input type hints within functions. We will now cover basics of inputting type hints within classes. We will also cover the ClassVar type hint.

from typing import ClassVar

class Alibaba:
    # Class level variable to be defined using ClassVar.
    count: ClassVar[int] = 0
    
    # __init__ returns nothing so we define it as None
    def __init__(self, name: str, apples: int = 0) -> None:
        self.name = name
        self.apples = apples
        Alibaba.count += 1
    # For instance methods, omit type for "self"
    def give_away(self, apples: int) -> None:
        self.apples -= apples
        
    def receive(self, apples: int) -> None:
        self.apples += apples

class Ripalo:
    pass

# Initiate the class
a: Alibaba = Alibaba('Lim SZ', 70)
b: Alibaba = Alibaba('Jen', 20)

def transfer(giver: Alibaba, receiver: Alibaba, apples: int) -> None:
    giver.give_away(apples)
    receiver.receive(apples)

print(transfer.__annotations__)

User-defined classes are considered valid types in annotations. What does this mean? See the examples below.

# The first two will pass but the last one will not.

# 1. Class present. Correct type hint
a: Alibaba = Alibaba('Lim SZ', 70)

# 2. Class present. Incorrect type hint
b: Ripalo = Alibaba('Jen', 20)

# 3. Class not present. Even if the class hasn't been initiated.
c: AliRipalo = Alibaba('Fail')

TypeVar Type Hint

TypeVar allows you to create a placeholder for a type to be specified later. It’s particularly useful when you want to define a function or a class that can work with different types without explicitly specifying them.

from typing import TypeVar, List

# Define a TypeVar
T = TypeVar('T')  # Creating a placeholder type

# A function that returns the first item from a list of any type
def first_item(items: List[T]) -> T:
    return items[0]

# Using the function with different types
int_list = [1, 2, 3, 4, 5]
str_list = ['apple', 'banana', 'cherry']

result1 = first_item(int_list)  # Returns an integer
result2 = first_item(str_list)  # Returns a string

print("First item in int_list:", result1)  # Output: First item in int_list: 1
print("First item in str_list:", result2)  # Output: First item in str_list: apple

print(first_item.__annotations__)

Any Type Hint (For Complicated Scenarios)

For complicated scenarios, (or if we are lazy), just use the Any type hint.

from typing import Any

def add_numbers4(a: str, b: str) -> Any:
    return a + b

result = add_numbers3(5, 10)
print(result)
print(add_numbers4.__annotations__)

NewType Type Hint (A Wrapper)

Using NewType type hint, we are able to define our own aliases for different types to make it easier for our own understanding.

NewType and Type Aliases are quite similar but different. To illustrate better, I will provide a comparison based on the Type Alias example.

# Previous code from Type Alias
from typing import List, NewType

Vector = List[float]

def scale(scalar: float, vector: Vector) -> Vector:
    return [scalar * num for num in vector]

print("For Type Alias")
print(scale.__annotations__)


# Code for NewType
from typing import List, NewType

a = NewType('b', List[float])
print(type(a))
print(a)
print(a.__name__)

def scale(scalar: float, vector: a) -> a:
    return [scalar * num for num in vector]

print("For NewType")
print(scale.__annotations__)

Output for Type Alias is as below:

Output for NewType is as below:

It can be seen that when calling function.__annotations__, Type Alias will return the initial type. Whereas for Newtype, it will return a NewType. To call out the name of the NewType, you will have to call out the NewType.__name__

In short, what NewType is effectively doing is to create another distinct type that is based on another existing type. We can see from the below code that both UserId and ProductId are categorized as different types but they have the same underlying type (int).

# Code for NewType
from typing import NewType

# Creating distinct types for user ID and product ID
UserId = NewType('UserId', int)
ProductId = NewType('ProductId', int)

def process_user(user_id: UserId):
    # Process user using UserId
    print(f"Processing User ID: {user_id}")

def process_product(product_id: ProductId):
    # Process product using ProductId
    print(f"Processing Product ID: {product_id}")

# Usage of the distinct types
user_id = UserId(101)
product_id = ProductId(5001)

process_user(user_id)
process_product(product_id)

print(process_user.__annotations__)
print(process_product.__annotations__)

The full code is available in the Jupyter Notebook below.

Closing Thoughts

Type annotations in Python are a powerful tool for fortifying code quality and reducing bugs during development. While they enhance code clarity and error-catching abilities, integrating them demands additional time and effort. For smaller projects characterized by simplicity and easy comprehension, they might not seem essential. However, in larger endeavors, the advantages become unmistakably apparent.

Personally, I’ve witnessed the tangible benefits of type hints in larger projects, where they substantially bolstered maintainability and readability, leading to more robust and reliable codebases. Balancing the upfront investment against the long-term advantages is key, especially as projects scale and complexity grows. Ultimately, while not a silver bullet, type annotations undeniably serve as a valuable ally in crafting resilient and maintainable Python projects.