Pydantic for Experts: Multi-Field Validation
Advanced abstraction patterns for encapsulating multi-field validation.
Congratulations 🎉
If you’re reading this, you probably want to improve your python skills and learn some advanced pydantic functionality.
⚠️ Disclaimer: I’m a contributor to Pydantic.
Introduction
Pydantic is the go-to data validation python library. It enforces schemas through type hints with runtime validation. It also allows for “schema coercion” — manipulating data to fit the expected schema.
🧠 This article assumes familiarity and proficiency with pydantic.
For more introductory content, see their examples documentation.
Problem Statement: Multiple Dependent Fields
How to tie the existence of several fields to each other?
This problem can exist in multiple forms of complexity.
Let’s start with the simplest case first.
Simple: 3 dependent fields
Take this ResponseModel
as an example:
import datetime
from typing import Optional
from pydantic import BaseModel
class SimpleResponseModel(BaseModel):
system_id: str
email: Optional[str] = None
email_source_date: Optional[datetime.date] = None
email_source_id: Optional[str] = None
I want to ensure all 3 email fields are present, or none at all.
Challenging: Groups of dependent fields
A more realistic situation is where there are multiple groups of dependent fields — each “key” having a source_date
and source_id
:
import datetime
from typing import Optional
from pydantic import BaseModel
class ChallengingResponseModel(BaseModel):
system_id: str
email: Optional[str] = None
email_source_date: Optional[datetime.date] = None
email_source_id: Optional[str] = None
address: Optional[str] = None
address_source_date: Optional[datetime.date] = None
address_source_id: Optional[str] = None
account_balance: Optional[int] = None
account_balance_source_date: Optional[datetime.date] = None
account_balance_source_id: Optional[str] = None
Here, we have email
, address
, and account_balance
, each expected to have all associated fields (source_date
and source_id
), or none.
Even More Challenging: Multiple models X different groupings
What if we have multiple models, each with different group structures?
import datetime
from typing import Optional
from pydantic import BaseModel
class ChallengingResponseModel(BaseModel):
system_id: str
account_type: Optional[str]
account_type_source_date: Optional[datetime.date] = None
account_type_source_lob: Optional[str] # <-------- new field
account_type_source_lob_id: Optional[str] # <----- lob_id instead of source_id
The account_type
group varies from previous groups — it has a new field (source_lob
) and a variation of the _id
field.
We need a clean, reusable approach to handle this complexity.
SOLUTION OVERVIEW
Herein are 3 solution designs, presented in increasing levels of abstraction.
Solution 1: Model Validator
Let’s start simple. You can compare all fields to each other, after they’re set on the model using a model_validator:
import datetime
from typing import Optional
from pydantic import BaseModel, model_validator
class SimpleResponseModel(BaseModel):
system_id: str
email: Optional[str] = None
email_source_date: Optional[datetime.date] = None
email_source_id: Optional[str] = None
@model_validator(mode="after")
def all_or_none_emails(self) -> 'SimpleResponseModel':
"""
All 3 email fields must be present, or none at all
"""
email_fields = [
self.email,
self.email_source_date,
self.email_source_id
]
if any(email_fields) and not all(email_fields):
raise ValueError(
"All 3 email fields must be present or none at all."
)
return self
When to use:
This solution works well when you have a narrow set of dependent fields.
Simple is good.
When not to use:
Too many dependent fields will result in duplicated code. Also, hardcoded field names can make this brittle and hard to maintain.
Repeated code is bad.
Problem: It’s annoying to test:
Lots of test cases which might never appear in “the real world”:
@pytest.mark.parametrize(
("data", "expectation"),
[
({}, pytest.raises(ValidationError)),
({"system_id": "abc-123"}, nullcontext()),
({"system_id": "abc-123", "email": "abc@h.com"}, pytest.raises(ValueError)),
({"system_id": "abc-123", "email": "abc@h.com", "email_source_date": "2024-01-01"}, pytest.raises(ValueError)),
({"system_id": "abc-123", "email": "abc@h.com", "email_source_date": "2024-01-01", "email_source_id": "123"}, nullcontext())
]
)
def test_response_model(data: dict, expectation):
with expectation:
_ = SimpleResponseModel.model_validate(data)
⚠️ This method indicates a poor abstraction:
email_
fields.
Solution 2: Nested Model
Because a key won’t exist without its dependents, it makes sense to create a nested structure:
{
"system_id": "abc-123",
"email_stuff": {
"email": "me@happy.com",
"email_source_date": "2024-01-01",
"email_source_id": "123"
},
"another_thing": {
...
}
}
This solution assumes you have control over how this data is being consumed.. Ideally, you should influence this process — don’t commit a sin which requires you to continue to sin…
Python code for this nesting is super simple:
class EmailStuff(BaseModel):
email: str
email_source_date: datetime.date
email_source_id: str
class SimpleResponseModel2(BaseModel):
system_id: str
email_stuff: Optional[EmailStuff] = None
Testing becomes simpler:
You don’t need to simulate partially populated EmailStuff
objects (unless you expect them in the “real world”).
When to use:
- If you can control how data is consumed.
- You have a relatively few number of nested groups. (If you have hundreds of groups, it will be tedious creating the nested models.)
When not to use:
- Sometimes you need to (or want to) keep a flat structure. (Ex: event streaming to a 3rd party.)
Solution 3: Dynamic Creation of Nested Model
We can add a powerful abstraction to our second solution, by utilizing pydantic’s create_model
function:
from pydantic import BaseModel, create_model
EmailStuff = create_model(
"EmailStuff",
email=str,
email_source_date=datetime.date
email_source_id=str,
__base__=BaseModel
)
We can abstract further by creating a helper function which wraps the pydantic function:
import datetime
from typing import Type, Optional
from pydantic import BaseModel, create_model
def create_stuff(name: str, dtype: Type) -> type[ModelT]:
"""dynamically creates nested schema for '_Stuff' objects"""
fields = {
name: (dtype, ...),
f"{name}_source_date": (datetime.date, ...),
f"{name}_source_id": (str, ...)
}
return create_model(
f"{name.title()}Stuff",
**fields,
__base__=BaseModel
)
class ResponseModel3(BaseModel):
system_id: str
email_stuff: Optional[create_stuff("email")] = None
address_stuff: Optional[create_stuff("address")] = None
Very neat! 🎉 We’ve now solved for the simple and complex case.
Abstracting even further:
What if we want to control which fields get created, without creating a new create_stuff(...)
function each time?
import datetime
from typing import Type, Optional, Dict
from functools import partial
from pydantic import BaseModel, create_model
from pydantic.main import ModelT
def create_stuff_base(name: str, dtype: Type, fields: Dict[str, Type]) -> type[ModelT]:
model_fields = {
name: (dtype, ...)
}
for k,v in fields.items():
model_fields[f"{name}_{k}"] = (v, ...)
return create_model(
f"{name.title()}Stuff",
**model_fields,
__base__=BaseModel
)
create_stuff_1 = partial(create_stuff_base, fields={"source_date": datetime.date, "source_id": str})
create_stuff_2 = partial(create_stuff_base, fields={"source_date": datetime.date, "source_lob": str, "source_lob_id": str})
class ChallengingResponseModel(BaseModel):
system_id: str
email_stuff: Optional[create_stuff_1("email")] = None
account_stuff: Optionl[create_stuff_2("account_type")] = None
Partial functions to the rescue!
When to use this:
- You have many nested schemas with varying group structures.
- In larger projects.
When not to use:
- Try the simpler things first.
- A dynamic function should simplify the problem more than it complicates the solution.
Summary
How do you handle multiple dependent fields in Pydantic? It depends…
We discussed several solution designs:
- Model validator: A simple solution when you can’t control the data structure
- Nested models: The simplest solution when you can control the data structure
- Dynamic creation of nested models: A powerful abstraction for creating nested models dynamically
When implementing, start with the simplest solution. If it gets unwieldy, then proceed with more abstractions.
Pydantic for Experts Series:
This article is part of a series on advanced usage of Pydantic.
- Don’t Write Another Line of Code Until You See These Pydantic V2 Breakthrough Features
An overview of several features I’m most excited about, introduced in V2. - Pydantic for Experts: Discriminated Unions in Pydantic V2
Differentiate model selection. - Pydantic for Experts: Reusing & Importing Validators
Advanced techniques for reusing and importing validation across python models.