Sitemap
Data Engineer Things

Things learned in our data engineering journey and ideas on data and engineering.

Pydantic for Experts: Discriminated Unions in Pydantic V2

4 min readNov 3, 2023

--

Congratulations 🎉

If you’re reading this, you probably want to improve your python skills and learn some advanced pydantic functionality.

⚠️ Disclaimer: I’m a contributor to Pydantic.

Designer hand pointing at colorful laminate sheet on material swatch wall display in material library.
Photo by Summer Paradive. (Used with license)

Introduction

Pydantic is the go-to data validation python library. With about 20 million downloads per week, it is among the top 100 python libraries.

Pydantic V2, introduces discriminated unions, an advanced data type (i.e. annotation, or in other languages, data structure) for performing sophisticated unions.

💡 Performance: Logic for discriminated unions in Pydantic V2 is implemented in Rust → which means that they’re very fast.

💡 Coming soon in Pydantic V2.5: Discriminated unions are about to get even more powerful, with functionality discriminators being introduced in pydantic 2.5

Problem Statement

Let’s use AWS Appflow TriggerConfig as an example — the model has the following constrains:

  • TriggerType: required
  • TriggerProperties: required only if TriggerType="Scheduled"

How should validate these properties?

Solution 1: Use a single pydantic model with a field validator

Perform assertions with a field_validator:

from datetime import datetime
from typing import Literal

from pydantic import BaseModel, field_validator


class TriggerConfig(BaseModel):
"""
Represents Appflow TriggerConfig object

Documentation:
1. https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-appflow-flow-triggerconfig.html
"""
TriggerType: Literal["OnDemand", "Event", "Scheduled"]
TriggerProperties: Optional[dict] = None

@field_validator('TriggerProperties', always=True)
def validate_trigger_properties(cls, v: Optional[dict], values: dict) -> Optional[dict]:
"""
If trigger type is Scheduled, then this should not be empty.
Otherwise, the value should be empty. (Empty dicts are converted to `None`)
"""

# Convert empty dict to None
if not v:
v = None

if values["TriggerType"] == "Scheduled":
if not v:
raise ValueError("triggerProperties must not be empty for a scheduled flow")
else:
if v is not None:
raise ValueError("triggerProperties must be empty for a scheduled flow")

return v

Pros:

  • Field validators are straightforward
  • Consistent with V1

Cons:

  • Field validators introduce additional scope, which makes it harder to test
  • Field validators have additional overhead
  • Solution isn’t super extendable

How a single model with a field validator works

There is a single model type for instances of Scheduled and OnDemand. When the model is instantiated, the field_validator ensures that the expectation of triggerProperties is met.

Being that pydantic V2 runs on a rust backend, it should be obvious that performing the expectation in a python function (via validator) incurs a significant performance hit.

Solution 2: Use Multiple Pydantic Models with a Discriminated Union

Using discriminated union is a lot neater.

from typing import Union, Literal, List

import pytest
from pydantic import BaseModel, Field, ValidationError


class TriggerConfig_1(BaseModel, extra="forbid"):
TriggerType: Literal["OnDemand", "Event"]


class TriggerConfig_2(BaseModel, extra="forbid"):
TriggerType: Literal["Scheduled"]
TriggerProperties: dict


# Create a custom type (and validate using a type adapter)
TriggerConfig = Annotated[
Union[TriggerConfig_1, TriggerConfig_2],
Field(discriminator='TriggerType')
]

Pros:

  • Neater code
  • Extendable
  • Native pydantic support — higher performance
  • Is declarative

Cons:

  • Not compatible with Pydantic V1

How Discriminated Union works:

There are 2 model types for instances, one for Scheduled and one for OnDemand|Event. They are “stitched together” via an annotated type — which performs the discriminated union.

When TriggerConfig is instantiated, the discriminated union checks for the value of TriggerType, and depending on the value, performs validation for the respective model types.

Performance is significantly faster, since discriminated union logic is performed in Pydantic’s rust backend. Further, validation is performed against a single model (as opposed to a regular union, which performs sequentially until a match is found).

An added benefit — validation errors are raised only on the respective model — so catching errors, or writing tests, becomes a lot easier and narrower.

Testing our code:

See for yourself how it works → Here are some unit tests to ensure behavior is being met.

💡 Tests will work for both solutions, give it a try.

import pytest
from pydantic import TypeAdapter, ValidationError


@pytest.mark.parametrize(
"data",
[
{'TriggerType': 'Scheduled', 'TriggerProperties': {'foo': 'bar'}},
{'TriggerType': 'OnDemand'},
{'TriggerType': 'Event'}
]
)
def test_trigger_config_valid(data: dict):
"""
Ensures TriggerConfig can instantiate valid objects
"""

ta = TypeAdapter(TriggerConfig)
_ = ta.validate_python(data)


@pytest.mark.parametrize(
"data",
[
{'TriggerType': 'Scheduled'},
{'TriggerType': 'OnDemand', 'TriggerProperties': {'foo': 'bar'}},
{'TriggerType': 'Event', 'TriggerProperties': {'foo': 'bar'}}
]
)
def test_trigger_config_invalid(data: dict):
"""
Ensures TriggerConfig raises error when instantiating invalid objects
"""

ta = TypeAdapter(TriggerConfig)
with pytest.raises(ValidationError):
_ = ta.validate_python(data)

Further Reading

  1. Pydantic’s Documentation: https://docs.pydantic.dev/latest/api/standard_library_types/#discriminated-unions-aka-tagged-unions
  2. An article I wrote an article about V2’s new features: Don’t Write Another Line of Code Until You See These Pydantic V2 Breakthrough Features
  3. This PR (on the pydantic codebase) can give you more info on how things actually work under the hood: https://github.com/pydantic/pydantic/pull/6570
  4. Discriminated Unions in TypeScript: https://www.typescriptlang.org/docs/handbook/unions-and-intersections.html#discriminating-unions

Summary

Discriminated unions are an advanced feature of the Pydantic V2 toolkit. The Pydantic way is good. (Especially when you compare to the implementation in other languages such as TypeScript or C++).

Now you’re using advanced pydantic features. Go you!

Special thanks to:

,

--

--

Data Engineer Things
Data Engineer Things

Published in Data Engineer Things

Things learned in our data engineering journey and ideas on data and engineering.

Yaakov Bressler
Yaakov Bressler

Written by Yaakov Bressler

Data Engineer @ Capital One. Editor in Chief @ Data Engineer Things. More about me at www.yaakovbressler.com

Responses (2)