Mastering Python Dataclasses: Tips and Tricks

surya singh
Python’s Gurus
Published in
4 min readMay 29, 2024

Dataclasses are a relatively recent addition to Python. They were introduced in PEP 557 and included in Python 3.7 and later versions. A data class is designed to hold only data values.

class Person:
def __init__(self, name, age, city): = name
self.age = age = city

In this example, we’ve defined a simple `Person` class with attributes for name, age, and city. This class requires an `__init__` method to initialise its attributes. Now, let’s reimplement the same class using a data class:

from dataclasses import dataclass

class PersonDataclass:
name: str
age: int
city: str

With dataclasses, the process becomes simpler. You no longer need to manually write an explicit `__init__` method or manage attribute assignments. The `@dataclass` decorator will generate these methods for you, making your code more concise and easier to understand.

But it is not limited to `__init__`. Dataclasses also provide efficient default implementations for standard methods like ‘__repr__’, ‘__eq__’, and ‘__hash__’, saving you time and effort writing these methods yourself.

Let’s create objects of both classes and see

person1 = Person('John Doe',30,'New York')
person2 = Person('John Doe',30,'New York')
persondc1 = PersonDataclass('Jack',32,'Seattle')
persondc2 = PersonDataclass('Jack',32,'Seattle')
person1 == person2
persondc1 = persondc2

To do the same print in class

class Person():
def __init__(self, name, age, height, email): = name
self.age = age
self.height = height = email

def __repr__(self):
return (f'{self.__class__.__name__}(name={}, age={self.age}, height={self.height}, email={})')

person = Person('Joe', 25, 1.85, '')

We can always overwrite it if we want to customise the representation of our class:

class Person():
name: str
age: int
height: float
email: str

def __repr__(self):
return (f'''This is a {self.__class__.__name__} called {}.''')

person = Person('Joe', 25, 1.85, '')

we can also combine `dataclass` with the `typing` modules to create attributes of any kind in the class. For instance, let’s add a `house_coordinates` attribute to the `Person`:

from typing import Tuple

class PersonDataclass():
name: str
age: int
city: float
house_coordinates: Tuple

print(Person('Jack', 32, 'Seattle', (40.748441, -73.985664)))

Following the same logic, we can create a data class to hold multiple instances of the `Person` class:

from typing import List

class People():
people: List[Person]

joe = Person('Joe', 25, 1.85, '', (40.748441, -73.985664))
mary = Person('Mary', 43, 1.67, '', (-73.985664, 40.748441))

print(People([joe, mary]))

As we saw above, when using the `dataclass` decorator, the `__init__`, `__repr__`, and `__eq__` methods are implemented for us.

But what about other things we want to data classes like hashing, sorting and comparison

class Person():
name: str
age: int
height: float
email: str

joe = Person('Joe', 25, 1.85, '')
mary = Person('Mary', 43, 1.67, '')

print(joe > mary)

The first is the `field` function. This function customises one attribute of a data class individually, allowing us to define new attributes that depend on another and are only created after the object is instantiated.

In our sorting problem, we’ll use `field` to create a `sort_index` attribute in our class. This attribute can only be made after the object is instantiated and is what `dataclasses` uses for sorting:

from dataclasses import dataclass, field

class Person():
sort_index: int = field(init=False, repr=False)
name: str
age: int
height: float
email: str

The two arguments we passed as `False` state that this attribute isn’t in the `__init__` and shouldn’t be displayed when we call `__repr__`. The documentation provides other parameters in the `field` function.

After referencing this new attribute, we’ll use the second new tool: the `__post_int__` method. As it goes by the name, this method is executed right after the `__init__` method. We’ll use `__post_int__` to define the `sort_index` right after the creation of the object. For example, let’s say we want to compare people based on age. Here’s how:

class Person():
sort_index: int = field(init=False, repr=False)
name: str
age: int
height: float
email: str

def __post_init__(self):
self.sort_index = self.age

If we make the same comparison, we know that Joe is younger than Mary:

joe = Person('Joe', 25, 1.85, '')
mary = Person('Mary', 43, 1.67, '')

print(joe > mary)

Inheritance with `dataclasses

The `dataclasses` module also supports inheritance, which means we can create a data class that uses the attributes of another data class. Still using our `Person` class, we’ll create a new `Employee` class that inherits all the attributes from `Person`. So we have `Person`:

class Person():
name: str
age: int
height: float
email: str

And the new `Employee` class:

class Employee(Person):
salary: int
department: str

Now, we can create an object of the `Employee` class using all the attributes of the `Person` class:

print(Employee('Joe', 25, 1.85, '', 100000, 'Marketing'))

More details about dataclasses are available here


In conclusion, dataclasses in Python offer a powerful and concise way to define classes focused on data storage. By automating method generation, reducing boilerplate code, and providing flexibility, dataclasses enhance code readability and streamline development. While they might not be a one-size-fits-all solution, incorporating dataclasses into your Python toolkit can lead to more maintainable, expressive, and efficient code, especially in scenarios where simplicity and data-centric design are vital considerations.

Python’s Gurus🚀

Thank you for being a part of the Python’s Gurus community!

Before you go:

  • Be sure to clap x50 time and follow the writer ️👏️️
  • Follow us: Newsletter
  • Do you aspire to become a Guru too? Submit your best article or draft to reach our audience.

