Python Typing: Enhancing Code Clarity and Reliability
Python’s dynamism has long been both its strength and a potential pitfall, especially in larger codebases. With the introduction of type annotations in Python 3.5 and the typing
module, developers gained a powerful tool to enhance code clarity and reliability without sacrificing Python's flexibility.
What are Type Annotations?
Type annotations in Python allow developers to hint at the expected types of variables, function arguments, and return values within their code. This practice enhances code readability and makes it easier to understand the intended use of different components.
This is especially useful when breaking yourself into packages or modules with chains of function calls and there are many arguments / variables.
# Annotation to a variable with a value.
length: int = 4
# Annotation to a variable without assigning a value is also acceptable.
length1 : float
# Function without Type Annotations
def add_numbers(a, b):
return a + b
result = add_numbers(5, 10)
print(result)
# Function with Type Annotations
def add_numbers(a: int, b: int) -> int:
return a + b
result = add_numbers(5, 10)
print(result)
Seems pretty simple and unnecessary when the code is simple. However, when the code gets more complex, it is indeed useful in better understanding the code. Code snippet below is mainly to illustrate the types and complexity only.
from typing import Tuple
def process_data(spark_df: 'pyspark.sql.DataFrame',
pandas_df: 'pd.DataFrame',
col_name: str = 'generic',
threshold: float = 1.0,
option: dict,
num_iterations: int = 2,
verbose: bool = True) -> Tuple[str, bool, 'pyspark.sql.DataFrame']:
"""
Process data using PySpark DataFrame and Pandas DataFrame.
Args:
- spark_df: A PySpark DataFrame.
- pandas_df: A Pandas DataFrame.
- col_name: Name of the column to perform operations on.
- threshold: Threshold value for data processing.
- option: Additional options for data processing.
- num_iterations: Number of iterations for the process.
- verbose: Verbosity flag for logging.
Returns:
- A tuple containing:
- A string message indicating processing completion.
- A boolean flag indicating success.
- Resultant PySpark DataFrame.
"""
# Perform operations on the PySpark DataFrame
processed_spark_df = spark_df.filter(spark_df[col_name] > threshold)
# Perform operations on the Pandas DataFrame
processed_pandas_df = pandas_df[pandas_df[col_name] > threshold]
# Additional operations based on the options provided
if 'scaling' in option and option['scaling']:
# Perform scaling operations
pass # Placeholder for scaling operations
# Additional iterations as per num_iterations
for _ in range(num_iterations):
# Perform iterative operations
pass # Placeholder for iterative operations
if verbose:
# Print verbose information
verbose_str = "Processing complete."
print(verbose_str)
return (verbose_str, True, processed_spark_df)
Python itself does not enforce type annotations by default. The interpreter won’t raise errors or stop execution if the code doesn’t follow the specified type hints. However, you can use external tools and libraries to enforce type annotations and perform static type checking. Below is an example of Python not enforcing the type
# Function with WRONG Type Annotations will still work.
def add_numbers3(a: str, b: str) -> float:
return a + b
result = add_numbers3(5, 10)
print(result)
print(add_numbers3.__annotations__)
Different Examples of Type Annotations
Before going into the different type annotations, lets study the typing package available here.
The different available classes can be found from here.
You may also refer to documentation for mypy (which is a static type checker for Python) here. Mypy helps to check using the type hints, whether you are using your variables and functions in your code correctly. It finds bugs without running them
Standard Built-In Types
Below are some basic type hints.
a: str = 'some string'
a: int = 1
a: float = 2.0
a: bool = False
For Python 3.9 and above, the name of the collection type is not capitalized.
a: list[str] = ['a']
a: list[str | int] = ['a', 'b', 3, 4] # <-- Python 3.10 and above.
a: set[float] = {1.0, 2.0, 3.0}
a: dict[str, str] = {'test': 'ripalo'}
a: tuple[int, str, bool] = (3, 'test', False)
a: tuple[str, ...] = ('a', 'b', 'c', 'd')
For Python 3.8 and earlier, the name of the collection type is capitalized. Also, you will need to import the typing
module. Personally, I just import the typing
module as a habit no matter which version of Python I am using.
from typing import List, Set, Dict, Tuple, Union
a: List[str] = ['a']
a: List[Union[str, int]] = ['a', 'b', 3, 4] # <-- Python 3.9 and earlier
a: Set[float] = {1.0, 2.0, 3.0}
a: Dict[str, str] = {'test': 'ripalo'}
a: Tuple[int, str, bool] = (3, 'test', False)
a: Tuple[str, ...] = ('a', 'b', 'c', 'd')
Optional Type is used when your function returns either a None or the type you define.
from typing import Optional, Union
a: Optional[int] = 3 # It could also be None
a: Optional[str] = 'test' # It could also be None
Type Aliases
We can define our own aliases for different types to make it easier for our own understanding.
from typing import List
Vector = List[float]
def scale(scalar: float, vector: Vector) -> Vector:
return [scalar * num for num in vector]
Callable Type Hint
Callable
type hint is used to annotate objects that are callable such as functions or methods. You can find a more detailed elaboration on callabel objects here. Below is an example of how you can use this type hint.
from typing import Callable
def add(a: int, b: int) -> int:
return a + b
def multiply(x: int, y: int) -> int:
return x * y
# A function that takes two integers and a Callable as arguments
def perform_operation(x: int, y: int, operation: Callable[[int, int], int]) -> int:
return operation(x, y)
print(add.__annotations__)
print(multiply.__annotations__)
print(perform_operation.__annotations__)
result1 = perform_operation(5, 3, add) # Calling perform_operation with 'add'
print("Result of addition:", result1)
result2 = perform_operation(4, 2, multiply) # Calling perform_operation with 'multiply'
print("Result of multiplication:", result2)
Classes
We have pretty much covered a number of examples on how to input type hints within functions. We will now cover basics of inputting type hints within classes. We will also cover the ClassVar
type hint.
from typing import ClassVar
class Alibaba:
# Class level variable to be defined using ClassVar.
count: ClassVar[int] = 0
# __init__ returns nothing so we define it as None
def __init__(self, name: str, apples: int = 0) -> None:
self.name = name
self.apples = apples
Alibaba.count += 1
# For instance methods, omit type for "self"
def give_away(self, apples: int) -> None:
self.apples -= apples
def receive(self, apples: int) -> None:
self.apples += apples
class Ripalo:
pass
# Initiate the class
a: Alibaba = Alibaba('Lim SZ', 70)
b: Alibaba = Alibaba('Jen', 20)
def transfer(giver: Alibaba, receiver: Alibaba, apples: int) -> None:
giver.give_away(apples)
receiver.receive(apples)
print(transfer.__annotations__)
User-defined classes are considered valid types in annotations. What does this mean? See the examples below.
# The first two will pass but the last one will not.
# 1. Class present. Correct type hint
a: Alibaba = Alibaba('Lim SZ', 70)
# 2. Class present. Incorrect type hint
b: Ripalo = Alibaba('Jen', 20)
# 3. Class not present. Even if the class hasn't been initiated.
c: AliRipalo = Alibaba('Fail')
TypeVar Type Hint
TypeVar
allows you to create a placeholder for a type to be specified later. It’s particularly useful when you want to define a function or a class that can work with different types without explicitly specifying them.
from typing import TypeVar, List
# Define a TypeVar
T = TypeVar('T') # Creating a placeholder type
# A function that returns the first item from a list of any type
def first_item(items: List[T]) -> T:
return items[0]
# Using the function with different types
int_list = [1, 2, 3, 4, 5]
str_list = ['apple', 'banana', 'cherry']
result1 = first_item(int_list) # Returns an integer
result2 = first_item(str_list) # Returns a string
print("First item in int_list:", result1) # Output: First item in int_list: 1
print("First item in str_list:", result2) # Output: First item in str_list: apple
print(first_item.__annotations__)
Any Type Hint (For Complicated Scenarios)
For complicated scenarios, (or if we are lazy), just use the Any
type hint.
from typing import Any
def add_numbers4(a: str, b: str) -> Any:
return a + b
result = add_numbers3(5, 10)
print(result)
print(add_numbers4.__annotations__)
NewType Type Hint (A Wrapper)
Using NewType
type hint, we are able to define our own aliases for different types to make it easier for our own understanding.
NewType
and Type Aliases are quite similar but different. To illustrate better, I will provide a comparison based on the Type Alias example.
# Previous code from Type Alias
from typing import List, NewType
Vector = List[float]
def scale(scalar: float, vector: Vector) -> Vector:
return [scalar * num for num in vector]
print("For Type Alias")
print(scale.__annotations__)
# Code for NewType
from typing import List, NewType
a = NewType('b', List[float])
print(type(a))
print(a)
print(a.__name__)
def scale(scalar: float, vector: a) -> a:
return [scalar * num for num in vector]
print("For NewType")
print(scale.__annotations__)
Output for Type Alias is as below:
Output for NewType is as below:
It can be seen that when calling function.__annotations__, Type Alias will return the initial type. Whereas for Newtype, it will return a NewType. To call out the name of the NewType, you will have to call out the NewType.__name__
In short, what NewType is effectively doing is to create another distinct type that is based on another existing type. We can see from the below code that both UserId and ProductId are categorized as different types but they have the same underlying type (int).
# Code for NewType
from typing import NewType
# Creating distinct types for user ID and product ID
UserId = NewType('UserId', int)
ProductId = NewType('ProductId', int)
def process_user(user_id: UserId):
# Process user using UserId
print(f"Processing User ID: {user_id}")
def process_product(product_id: ProductId):
# Process product using ProductId
print(f"Processing Product ID: {product_id}")
# Usage of the distinct types
user_id = UserId(101)
product_id = ProductId(5001)
process_user(user_id)
process_product(product_id)
print(process_user.__annotations__)
print(process_product.__annotations__)
The full code is available in the Jupyter Notebook below.
Closing Thoughts
Type annotations in Python are a powerful tool for fortifying code quality and reducing bugs during development. While they enhance code clarity and error-catching abilities, integrating them demands additional time and effort. For smaller projects characterized by simplicity and easy comprehension, they might not seem essential. However, in larger endeavors, the advantages become unmistakably apparent.
Personally, I’ve witnessed the tangible benefits of type hints in larger projects, where they substantially bolstered maintainability and readability, leading to more robust and reliable codebases. Balancing the upfront investment against the long-term advantages is key, especially as projects scale and complexity grows. Ultimately, while not a silver bullet, type annotations undeniably serve as a valuable ally in crafting resilient and maintainable Python projects.