Understanding Python Dataclasses — Part 2

Shikhar Chauhan
MindOrks
Published in
4 min readJul 6, 2018

This is the second part of the two part series on Python’s new Dataclasses. In the first part I discussed the general usage of the new dataclasses. This post deals with another feature : dataclasses.field .

We saw that Dataclasses generate their own __init__ method. And it assigns each defined field the value passed during initialization. Here were two things we defined in the last post:

  • variable name
  • data type

This leaves us quite limited to our usage of dataclass fields. Let’s discuss some of the limitations, and how they can be solved with dataclasses.field .

Complex Initialization

Consider a scenario, where you want an attribute to be a list upon initialization. How do we make it happen? One simple way would be do that in __post_init__ method.

The dataclass Student expects a list of marks . We chose not to pass the value of marks , but rather initialized it using the __post_init__ method. This was a single attribute that we defined. Moreover, we had to call get_random_marks in __post_init__ . That’s just extra work.

Fortunately, Python coredevs have a solution for us. It is possible to customize behaviour of dataclass fields and their impact on the dataclass using dataclasses.field .

Continuing with the above use case, let us eliminate the need of calling get_random_marks from __post_init__ . Here’s how we’d do it with dataclasses.field :

dataclasses.field accepts a default_factory argument which can be used to initialize the field if a value is not passed at the time of creation of the object.

default_factory should be a callable( generally a function ) which accepts no arguments.

This way we can initialize the fields in a more complex form. Now, moving to another use case.

All fields for data comparison

From the last post, we know that dataclass can auto generate comparison methods for < , =, >, <=, >=. But a quirk with them was that they used all the class fields for comparison, which is not useful always. More often than not, it would be a hindrance in our usability of dataclasses.

Consider a use case where you have dataclass for holding information of the users of your service. Now, it may involve fields like:

  • Name
  • Age
  • Height
  • Weight

And you want the user objects only to be compared using age, height, and weight. You don’t want name to be used for comparison. This is a very common use case for backend developers.

The auto-generated comparison methods would compare the following tuple:

(self.name, self.age, self.height, self.weight)

This would defeat our purpose. We do not want name to be used for comparison. So, how do we do it with dataclasses.field ?

Here’s how:

By default all the fields are used for comparison, so we need to decide which fields we do not need for comparison, and define them explicitly as field(compare=False).

A simpler use case can also be discussed. Let us define a dataclass that holds a number and its string representation. And we want the comparison to take place only through the value of the number, not using its string representation.

Now, we have the power to control the behavior of dataclasses to a greater granularity. Feels quite amazing!

All fields used for Representation

The auto-generated __repr__ method uses all fields for representation. Well, this is not an ideal situation in a lot of cases. Specially, when your dataclass has a lot of fields. A single object representation would get pretty huge, and a debugging nightmare.

Imagine seeing that representation in your logs and writing a regex to search for it. Gruesome, right?

Well, we can customize this behavior as well. For a use case like this, probably the only useful attribute for representation is name . So, let’s just use that for __repr__ :

This looks great. Simple debugging and meaningful comparison!

Omitting fields from initialization

All the examples we saw till now, had one thing in common, we were passing values for fields that were declared, except for when they had default values. In that case we can choose to pass the value for that field or not.

But there’s another case: we might want to not set a field’s value through initialization. This is a common use case. Maybe you are tracking state of an object and always have it set to False at the time of initialization, moreover, this value is never passed during initialization.

So, how do we achieve this? Here’s how:

And voila! We have a lot more flexibility with dataclasses now.

Conclusion

Hopefully the two posts helped you understand dataclass and you look forward to using them in your projects soon!

Thank you for reading. So long, and thanks for all the fish.

Follow me on Github, Twitter, LinkedIn.

--

--

Shikhar Chauhan
MindOrks

Machine Learning Engineer. Open Source Contributor. Mentor. Trekker.