Python dataclass inheritance, finally !

Anis Campos
6 min readOct 25, 2021

--

For all of you that struggled while using inheritance with dataclasses, be comforted by the new kw_only feature available since 3.10, released on October 4th 2021, that should at least make it less cumbersome to use inheritance.

Python 3.10 logo, from https://www.python.org/downloads/release/python-3100/

So what is kw_only ?

This feature changes everything, yet it can be completely overlooked as there are way more “impactful” novelties in this new release, such as pattern matching and new typing features. So I hope to make you see why I like it so much.

This is what the documentation says about this new feature:

kw_only: If true (the default value is False), then all fields will be marked as keyword-only. If a field is marked as keyword-only, then the only affect is that the __init__() parameter generated from a keyword-only field must be specified with a keyword when __init__() is called. There is no effect on any other aspect of dataclasses. See the parameter glossary entry for details. Also see the KW_ONLY section.

New in version 3.10.

If against all odds you are still not understanding where this mouthful definition can help us have a better life, let me help you see it from an other angle.

Before the version 3.10, inherit at your own risks

So let’s start by exposing a very annoying flaw of the conceptions of python dataclasss.

I suppose you are very familiar with dataclasses, but just in case, can you tell why exactly you can never define a field without a default value after a field that has one ? Let me make that more clear, this is what I’m talking about:

So the reason is simple: the constructor, the __init__ method, that the dataclass conveniently generates for us is very straightforward: it simply takes the field in order of declaration and adds them one after the other in the method’s definition.

Because in Python (initially, more about that later), default-valued arguments must always come after all positional arguments, the dataclass field declaration must also follow this logic and always define the fields without default value before any field with one.

Let’s get this straight, this is a very naive approach but it’s still carries one very questionable but still mainly used feature: when instantiating a dataclass, the order of the arguments is clear: it’s always the same order, from top to bottom, the order of declaration of the fields.

Because we don’t see this generated constructor, it’s without saying that having an unambiguous order was a must have. That being said, this still has some huge pain in the a** consequences.

Some of you might be thinking “I don’t see the issue here, you just have to reorder you fields and problem solved!”. Well, smarty-pants, you are right, but also, you are very wrong.

Indeed, inside one class, you can always reorder the fields to solve this issue. it might be ugly, some fields having close relationship might end-up be separate, but nonetheless this isn’t a deal-breaker.

But what if I want to use inheritance, what happens if I write this code ?

The Base class is fine, the fields are defined in order, no issue there.

For Foo on the other hand, it might be not obvious, but the order is invalid. If you try to execute the code, you’d get: TypeError: non-default argument 'id' follows default argument.

This is due to how is generated the __init__ method when there are base classes to handle. Yep, you guessed it, the order of the arguments is: all the attributes from the parent(s) class(es) first then those of the child class.

In another words, if we take the previous example, we’d have :

def __init__(self, type: str, counter: int = 0, id: int)

Which is an invalid definition that does not follow the rule of declaring default-values arguments last.

Again, it’s was mandatory to have a predictable order for this invisible method, so here we are, forever stuck with this dilemma: If you define a base class with default values, all the fields in the child classes will have to have a default value as well, even if it completely removes all sense of “mandatory” fields.

Now with 3.10, inherit like you mean it

Sorry, I lied, we are not forever stuck. 5 long years after the release of python 3.6 , which introduced dataclasses, we finally have a solution that completely solves this dilemma.

What could they possibly have done to fix the core principle of argument ordering of python ? Well, by simply letting how the constructor is generated and use keyword only arguments ( a.k.a kw_only).

To fully understand this, let’s go back on what I said about function signature in python

If you’ve ever seen a function defined as def func(*args, **kwargs) , then you know that args is an iterable of positional arguments. Whereas kwargs, the keyword arguments, is a dictionary where each argument is identified by their name instead of their order.

But there is also the possibility of using keyword only arguments, a less known novelty of python 3. So python full method signature is something like this:

def func(a,b,c=1, *args, d, e=None, f, **kwargs)

Shocking right ? I recommend you read the PEP 3102 for more details. But as you can see, after *args arguments can freely harbor default values, in any order. d,e,f are keyword only arguments, they can only be used by naming them. Also they’ll never be bound to extra positional arguments, those are sucked in *args .

But this feature also allows us to completely disable positional argument, you just have to define a method like this:

def kw_only_method(*,a=None,b=None)

Like before, any positional is sucked in * , the difference being that python raises if this unnamed argument is not empty.

Indeed if you ever try to pass positional argument anyway, this will be the result

>>> kw_only_method(1,2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: kw_only_method() takes 0 positional arguments but 2 were given

Now we understand what is behind kw_only, it simply applies PEP 3102 to the constructor and removes the possibility of using positional arguments. In another words, it becomes a keyword only constructor.

Let’s use it in our previous example:

Now there is no more TypeError when executing this code. We can define default values in the parent class without any restrictions over the child, all thanks to kw_only.

It’s not just inheritance

Inheritance is one aspect of the keyword only option. Actually, the more general consequence is that there is no more imposed order of the fields based on the presence of a default value.

PS: It might be shocking at first for you team mates and your first PR might be bumpy, but stay chill, be nice and explain them all of this.

Are required field still required ?

One concern you might have is if the required fields will still be required once used as keyword arguments.

Let’s define a keyword only method with an argument without default value and see how it works:

def kw_only_method(*, a=None, b, c=None)

here b is a required keyword-only argument and all call to kw_only_method will have to provide this argument. Don’t just believe me, try it, you should see this:

kw_only_method()TypeError: kw_only_method() missing 1 required keyword-only argument: b

For dataclasses, it means that in kw_only mode, if there is no default value provided for a field, it still is required. Indeed, try to instantiate Foo without passing id or type and you’ll see the same error.

Conclusion

Among all the limitation known in using dataclass, inheritance drawbacks was the most annoying. It was so bad that our team considered if it wasn’t better to simple discard the dataclass and use regular classes. But truth be told, losing the required aspect of fields was infuriating, but still acceptable when considering the huge work provided by the dataclass, i.e, all the boilerplate removed (immutability, constructor, comparator, hash, to string, etc…)

Now not only can we freely inherit and never find ourselves adding default values to untold amount of fields (when you have several layer of inheritance, try adding a default value in the base class and see for yourself…) but we also can forget about the need to order the fields in the class depending on whether they are optional or required.

The kw_only was a dream come true so to speak. Maybe I’m tiny bit overexcited by this relatively small addition, specially compared to the pattern matching powerhouse of a feature that the 3.10 release brings, but for me, this is a game changer.

A little disclaimer: as of today, October 2021, the version 3.10 might be a little to early to yet use it in production. There are several articles out there explaining why you should wait before using it. This shouldn’t prevent you to still prepare yourself and your team, this kind of changes asks for communication to be understood and used.

--

--

Anis Campos

Computer Scientist, in love with programming. Formerly at @Sanofi and @Vinci, I’m currently working at @Lumapps as a Python Backend developer.