Simplifying Data Structures: An Introduction to Python’s Dataclasses

Ryk Kiel
3 min readJan 12, 2023

--

Photo by Hunter Harritt on Unsplash

Dataclasses in Python

Python 3.7 introduced a new module called dataclasses that makes it easier to create simple, immutables data classes. These classes are similar to classes that you would define using the @dataclass decorator and the field() function from the dataclasses module. They are particularly useful when working with data that should be treated as a single unit, such as a point in a 2D space or an RGB color.

Creating a Dataclass

To create a dataclass, simply define a class and decorate it with the @dataclass decorator. Inside the class, use the field() function to define the fields of the class. For example, here's a simple Point class:

from dataclasses import dataclass, field

@dataclass
class Point:
x: float = field(default=0.0, metadata={'unit': 'meter'})
y: float = field(default=0.0, metadata={'unit': 'meter'})

In the example above, we defined a Point class with two fields: x and y. These fields are of type float and have default values of 0.0. We also added some metadata to the fields using the metadata argument of the field() function. This metadata is useful for storing additional information about the fields, such as the unit of measurement.

The @dataclass decorator automatically generates a number of useful methods for the class, such as __init__, __repr__, and __eq__. These methods provide a convenient way to initialize, display, and compare instances of the class.

Creating an Instance

You can create an instance of a dataclass using the standard syntax for creating an instance of a class. For example, to create a point at the origin:

origin = Point()
print(origin)

This will create Point(x=0.0, y=0.0)

You can also create an instance of a dataclass by providing values for the fields during initialization. For example, to create a point at (1, 2):

p = Point(1, 2)
print(p)

This will create Point(x=1, y=2)

Comparing Dataclasses

Dataclasses have a default __eq__ method implemented, which compares all the fields of the class for equality. For example, comparing two points:

p1 = Point(1, 2)
p2 = Point(1, 2)
print(p1 == p2) # True

If you want to compare dataclasses based on a subset of its fields, you can set compare=False when defining the field in class.

@dataclass
class Point:
x: float = field(default=0.0, compare=False)
y: float = field(default=0.0)

p1 = Point(1, 2)
p2 = Point(2, 2)
print(p1 == p2) # False

Modifying Dataclasses

By default, dataclasses are immutable, meaning their fields cannot be modified after they are created. However, it is possible to make a dataclass mutable by setting the frozen=False argument when using the @dataclass decorator. For example:

@dataclass(frozen=False)
class Point:
x: float = field(default=0.0)
y: float = field(default=0.0)

p = Point(1, 2)
p.x = 3
print(p)

This will create Point(x=3, y=2)

If you have a dataclass that is mutable and you want to make it immutable, you can set the frozen=True and also you can't use fields default values.

@dataclass(frozen=True)
class Point:
x: float
y: float

p = Point(1, 2)
p.x = 3 # raises FrozenInstanceError

Working with Inheritance

Dataclasses support inheritance just like regular classes. When you define a dataclass, any subclass will also be considered a dataclass. For example, if you have a Point class and you want to create a ColoredPoint class that inherits from it, you can do so like this:

@dataclass
class Point:
x: float
y: float

@dataclass
class ColoredPoint(Point):
color: str

cp = ColoredPoint(1, 2, 'red')
print(cp)

This will create ColoredPoint(x=1, y=2, color=’red’)

You can also override the __init__ method of the base class in the subclass if you need to do some additional processing when creating an instance of the subclass.

@dataclass
class Point:
x: float
y: float

@dataclass
class ColoredPoint(Point):
color: str
def __init__(self, x, y, color):
super().__init__(x, y)
self.color = color.upper()

cp = ColoredPoint(1, 2, 'red')
print(cp)

This will create ColoredPoint(x=1, y=2, color=’RED’)

Conclusion

Dataclasses provide a convenient way to create simple, immutable data classes in Python. They can be used to define classes that represent data that should be treated as a single unit, such as points in a 2D space or RGB colors. Dataclasses automatically generate a number of useful methods, such as __init__, __repr__, and __eq__, which makes them easy to use. They also support inheritance, making it easy to create classes that are based on existing dataclasses.

Thanks for reading and happy coding! Please remember to follow for more weekly articles.

--

--

Ryk Kiel

I am a Python lover with a love for problem-solving and creating solutions. I have expertise in web development, data analysis, and machine learning.