Python dataclasses

Efficient typing and dynamic checks

CFM Tech
CFM Insights
5 min readApr 6, 2022

--

Typing data in python

Python has built a rich history by being a duck-typed language : if it quacks like a duck, treat is as such. But as the codebases grow, people rediscover the benefit of strong-typing.

Whether to help your IDE to know which fields to autocomplete, or to check dynamically that the preconditions are met for your algorithm, or to automatically validate.

Various solutions to do part of this jobs have existed for some time, like Schematics, Marshmallow. But recently, we’ve reached a sweet spot with the convergence of multiple streams:

  • the arrival of of dataclasses in python 3.7 (backported through the dataclasses package)
  • the typing module in python 3.5 (backported through typing_extensions)
  • Pydantic as a unifying framework, built with these new idioms, for these new idioms.
  • Various forays in static typing by by python players (pyre, mypy, etc.)

In the next few posts, we’ll look a bit at how these packages help us modernising a trusty old synchronous flask-marshmallow-json stack into an asynchronous fastapi stack with proper typing and validation, websockets and GraphQL.

DataClasses

First, let’s recap a bit what Dataclasses are.

They work out-of-the-box in python 3.7 and above. In previous versions, you need to install:

The package is smart enough to install only a transparent layer if you already have dataclasses as per python 3.7+

Let’s start with a simple example:

The @dataclass line is a decorator. It automatically adds the following method to our class:

Because the method is surrounded by double-underscores, it’s called a dunder-function. And for simplicity’s sake, we’ll call it a constructor, just like on other languages.

As we can see in the sample, fields can be specified with a default value. Just like in constructors and functions, it’s not possible to have a field without a default value following one with a default value.

You can pass parameters to the dataclass decorator:

The fields mean:

  • automatically generate the __init__(), __repr__() functions.
  • by default, do not generate the comparison functions __lt__(), __le__(), __gt__(), and __ge__()
  • by default, the class is not frozen (e.g. immutable): you can still edit the fields of a dataclass object after its construction.

Composition of dataclasses

You can compose variables, e.g. a dataclass can have fields that are dataclasses.

You can also use inheritance: a dataclass can enrich another.

Compared to what you’re probably used to in other languages, this creates a constructor for all fields, including inherited ones !

The way to instantiate such a dataclass would be through:

Square(1,2,3)

This is not what we want in classical object-oriented-programming. We’d rather consider that:

  • a SquareIS A special kind of rectangle
  • But setting the square side should set the height and width values accordingly.

So, the takeaway point here is:

Do not use inheritance if you’re planning to hide base dataclass fields.

Moreover, we saw earlier that fields in a dataclass can have default values, in which case all subsequent fields must also have a default values. This rule is ALSO true for inherited classes. In other words, if a dataclass has default values, all its derived class must have all all of their fields with default values too.

We can escape some of these rules though, with special features of dataclasses.

For example, we can create abstract dataclasses by asking Python to NOT create an init-member for a field. This is done by using the field specifier:

Our Rectangle now looks pretty much like an Abstract Base Class (though not in the modern Python meaning, e.g. PEP 3119).

As such, we can create a Square dataclass that only expose one field. It’s still possible to create a Rectangle though, and set its fields after construction:

We can also hide variables, or use a mix of post-init and init-only variables. This is useful for variables that require computation:

Pydantic dataclasses for dynamic verification

As we’ve seen, python dataclasses fill a niche need to data-classes. Although they are regular classes, it’s highly recommended to keep them as vessels for clean, typed, data, and not add too much code in them. The issue though, is that apart from the dunder function generation, and cleaner-looking code, they don’t seem to provide much. In particular, type hints are just that: type hints. You can still store any value of any type inside them.

There’s a powerful and elegant solution to do so though, and it’s to check dynamically the types using pydantic. Although pydantic comes with its own powerful typing system and schema validation, it’s also able to plug straight into python’s dataclasses.

Let’s enrich a bit our examples:

We’ve been able to grab a python dataclass and convert it to a Pydantic class called PItem. This one will now perform all the checks we’d expect from our typing system.

A few things to notice:

  • None is its own special type. If you want an optional value for a field, it needs to be set to to Optional.
  • An empty string is a proper value for a non-optional string.
  • typing only apply to the input parameters. Pydantic is unable to check that you respect the typing system when assigning the result of total_cost

Actually, the pydantic.dataclasses module not only provides the conversin function, it’s also a superset of the python.dataclasses module. As such, you can even substitute pydantic dataclasses to dataclass in one line of code:

As we can see, default values are handled perfectly.

Pydantic performs implicit type coercion where it seems reasonable: numbers can be converted to strings, and string representing numbers can be converted as such.But passing a list of numbers cannot be converted to a string.

This concludes this first post about data types with dynamic checks. Using a combination of dataclasses and pydantic provides a code that’s much more elegant than using marshmallow or schematics, even when using attr.ib or other extensions. It also avoids having too much ties to a 3rd party: at any point you can switch back to pythonic dataclasses, without having to convert tons of code.

--

--

CFM Tech
CFM Insights

The latest technology insights and thinking from CFM. See our latest articles at CFM Insights: medium.com/capital-fund-management