Python minis 1: When to use Python Data classes ?
I am introducing Python minis series that talks briefly about internals of Python. This article series focuses on presenting different Python aspects like collections and useful in-built tools.
You should stop using regular classes in your code and start leveraging Python `dataclasses` for obvious use cases. It makes your life very simple. In this article we explore questions like:
- why Python data classes exist in first place?
- How can we refactor existing code and transform them into `dataclasses`
All the examples in this article uses Python interpreter ≥ 3.9
Introduction
Python dataclasses is a module that provides a dataclass
decorator that can transform a regular class into a rich class. We generally define a class using a constructor. Let’s say we create a class called `Point` that holds co-ordinates like below:
# main.py
class Point(object):
def __init__(self, x: int, y: int):
self.x = x
self.y = y
Use case 1: Print object details
Let us say we should print an object of this class. Let us instantiate the an object from the class `Point`:
# main.py
p = Point(0, 10)
# prints <__main__.Point1 object at 0x100656110>
print(p)
The information printed to console is not so useful. Let us define a __str__() method over the class so that we get descriptive text while pritning objects.
# main.py
class Point(object):
def __init__(self, x: int, y: int):
self.x = x
self.y = y
def __str__(self):
return f"{self.__class__}(x={self.x}, y={self.y})"
Now if we create an object and prints it to console, we see:
# main.py
p = Point(0, 10)
# prints <class '__main__.Point'>, x=0, y=10
print(p)
Use case 2: Add default values for constructor arguments
Let us say we have a requirement of starting every point from location (0, 0). We can modify our constructor method arguments to add default values to x and y.
# main.py
class Point(object):
def __init__(self, x: int = 0, y: int = 0):
self.x = x
self.y = y
def __str__(self):
return f"{self.__class__}(x={self.x}, y={self.y})"
Now, one can initialize an object without arguments, and Python will set default values as x = 0, y = 0.
# main.py
p = Point()
# prints <class '__main__.Point'>, x=0, y=0
print(p)
Use case 3: Get class properties as a dictionary
We can get all available properties of a class as a dictionary using __dict__ method. Let us add a new method called asdict()
which returns class properties as a Python dict:
# main.py
class Point(object):
def __init__(self, x: int = 0, y: int = 0):
self.x = x
self.y = y
def __str__(self):
return f"{self.__class__}(x={self.x}, y={self.y})"
def asdict(self):
return self.__dict__
Now testing the object,
# main.py
p = Point()
# prints {'x': 0,'y': 0}
print(p.asdict())
As you can see the initially defined class is slowly getting bloated by addition of trivial functions. Python dataclasses
solve this problem by adding a decorator to a class.
Python dataclass to reduce boilerplate
We can re-write the Point class by importing a decorator function called dataclass
from dataclasses
module.
# main.py
from dataclasses import dataclass
@dataclass
class Point:
x: int = 0
y: int = 0
That’s it. We have a fully functional class with an automatic constructor, and a default string representation. Now let us test by creating an object.
# main.py
p = Point(0, 10)
# prints Point(x=0, y=10)
print(p)
As you see, printing an object from dataclass has clean representation of Class<attr1=val1, attr2=val2, …). Use case 1 & Use case 2 are solved by importing a magic decorator called dataclass
. No __init__()
and __str__()
are defined.
Does dataclass
also support getting a dictionary view of class properties ? The simple answer is `Yes`. The dataclasses
module provides a function called asdict
to get a dict form of a dataclass.
# main.py
from dataclasses import asdict
p = Point()
# prints {'x': 0,'y': 0}
print(asdict(p))
Similarly, one can also get class properties as a tuple using astuple
function from `dataclasses` module.
Defining methods on a dataclass
Defining methods on a `dataclass` is similar to defining methods on a regular class. Just define a function with self as default mandatory argument, and you can access class properties in that function. Let us modify previous Point
dataclass to add a new method called distance_to()
that computes a distance from given point to a new point.
# main.py
from dataclasses import dataclass
from math import sqrt, pow
@dataclass
class Point:
x: int = 0
y: int = 0
def distance_to(self, p) -> float:
return sqrt(pow(p.x - self.x, 2) + pow(p.y - self.y, 2))
Now testing the method:
# main.py
p1 = Point()
p2 = Point(10, 20)
# prints 22.360679774997898
print(p1.distance_to(p2))
When not to use a dataclasses ?
Dataclasses are very handy when there is no complex logic in class constructor. Examples are sometimes we may perform complex operations like sanity checks and database connection initializations in a Python class constructor. In those cases, it is advised to use regular classes.
But, if you need to create a class solely for managing its data, use data classes over regular ones, It will avoid tons of boilerplate.
Regular class:
Pros:
- Can handle complex logic in constructors
Cons:
- Boiler plate code
- Encourages developers use internal methods like
__str__()
and__dict__()
Data class:
Pros:
- Clean and less boilerplate
- Handy methods for representation and dict and tuple transformations
Cons:
- Cannot perform complex constructor initialization
I hope this mini-article cleared the concepts of Python data classes. See you next time with another interesting topic. Thank you for reading this article.