Python minis 1: When to use Python Data classes ?

Published in

Dev bits

4 min readFeb 11, 2023

I am introducing Python minis series that talks briefly about internals of Python. This article series focuses on presenting different Python aspects like collections and useful in-built tools.

You should stop using regular classes in your code and start leveraging Python `dataclasses` for obvious use cases. It makes your life very simple. In this article we explore questions like:

why Python data classes exist in first place?
How can we refactor existing code and transform them into `dataclasses`

All the examples in this article uses Python interpreter ≥ 3.9

Introduction

Python dataclasses is a module that provides a dataclass decorator that can transform a regular class into a rich class. We generally define a class using a constructor. Let’s say we create a class called `Point` that holds co-ordinates like below:

# main.py
class Point(object):
    def __init__(self, x: int, y: int):
        self.x = x
        self.y = y

Use case 1: Print object details

Let us say we should print an object of this class. Let us instantiate the an object from the class `Point`:

# main.py
p = Point(0, 10)

# prints <__main__.Point1 object at 0x100656110>
print(p)

The information printed to console is not so useful. Let us define a __str__() method over the class so that we get descriptive text while pritning objects.

# main.py
class Point(object):
    def __init__(self, x: int, y: int):
        self.x = x
        self.y = y
    
    def __str__(self):
        return f"{self.__class__}(x={self.x}, y={self.y})"

Now if we create an object and prints it to console, we see:

# main.py
p = Point(0, 10)

# prints <class '__main__.Point'>, x=0, y=10
print(p)

Use case 2: Add default values for constructor arguments

Let us say we have a requirement of starting every point from location (0, 0). We can modify our constructor method arguments to add default values to x and y.

# main.py
class Point(object):
    def __init__(self, x: int = 0, y: int = 0):
        self.x = x
        self.y = y
    
    def __str__(self):
        return f"{self.__class__}(x={self.x}, y={self.y})"

Now, one can initialize an object without arguments, and Python will set default values as x = 0, y = 0.

# main.py
p = Point()

# prints <class '__main__.Point'>, x=0, y=0
print(p)

Use case 3: Get class properties as a dictionary

We can get all available properties of a class as a dictionary using __dict__ method. Let us add a new method called asdict() which returns class properties as a Python dict:

# main.py
class Point(object):
    def __init__(self, x: int = 0, y: int = 0):
        self.x = x
        self.y = y
    
    def __str__(self):
        return f"{self.__class__}(x={self.x}, y={self.y})"

    def asdict(self):
        return self.__dict__

Now testing the object,

# main.py
p = Point()

# prints {'x': 0,'y': 0}
print(p.asdict())

As you can see the initially defined class is slowly getting bloated by addition of trivial functions. Python dataclassessolve this problem by adding a decorator to a class.

Python dataclass to reduce boilerplate

We can re-write the Point class by importing a decorator function called dataclassfrom dataclasses module.

# main.py
from dataclasses import dataclass

@dataclass
class Point:
    x: int = 0
    y: int = 0

That’s it. We have a fully functional class with an automatic constructor, and a default string representation. Now let us test by creating an object.

# main.py
p = Point(0, 10)

# prints Point(x=0, y=10)
print(p)

As you see, printing an object from dataclass has clean representation of Class<attr1=val1, attr2=val2, …). Use case 1 & Use case 2 are solved by importing a magic decorator called dataclass. No __init__()and __str__() are defined.

Does dataclassalso support getting a dictionary view of class properties ? The simple answer is `Yes`. The dataclasses module provides a function called asdict to get a dict form of a dataclass.

# main.py
from dataclasses import asdict

p = Point()

# prints {'x': 0,'y': 0}
print(asdict(p))

Similarly, one can also get class properties as a tuple using astuple function from `dataclasses` module.

Defining methods on a dataclass

Defining methods on a `dataclass` is similar to defining methods on a regular class. Just define a function with self as default mandatory argument, and you can access class properties in that function. Let us modify previous Pointdataclass to add a new method called distance_to()that computes a distance from given point to a new point.

# main.py
from dataclasses import dataclass
from math import sqrt, pow

@dataclass
class Point:
    x: int = 0
    y: int = 0

    def distance_to(self, p) -> float:
        return sqrt(pow(p.x - self.x, 2) + pow(p.y - self.y, 2))

Now testing the method:

# main.py
p1 = Point()
p2 = Point(10, 20)

# prints 22.360679774997898
print(p1.distance_to(p2))

When not to use a dataclasses ?

Dataclasses are very handy when there is no complex logic in class constructor. Examples are sometimes we may perform complex operations like sanity checks and database connection initializations in a Python class constructor. In those cases, it is advised to use regular classes.

But, if you need to create a class solely for managing its data, use data classes over regular ones, It will avoid tons of boilerplate.

Regular class:

Pros:

Can handle complex logic in constructors

Cons:

Boiler plate code
Encourages developers use internal methods like __str__() and __dict__()

Data class: