Leveraging Pydantic as a validation layer.

Moenes Bensoussia
datamindedbe
Published in
5 min readFeb 6, 2024

Ensuring clean and reliable input is crucial for building robust services. One powerful tool that simplifies this process is Pydantic, a data validation and settings management library powered by type hints. We will explore why having a validation layer, serving as a guard, before diving into a service core logic is important and how Pydantic can be a game-changer when validating interaction with external systems.

Why a validation layer matters

I was working on a service that relies on YAML configuration files provided by end users. These files act as blueprints, defining data sources and transformations within our system. Initially, without a validation layer, we encountered issues due to missing fields or incorrect data. These problems caused errors at different stages of the service runtime, resulting in unwanted outcomes.

To resolve this, we continuously added conditions and validations within the codebase to prevent these issues. While this fixed the errors we faced, it also made our codebase more complex as the validation code logic is now mixed with the core business logic. Using a tool like CUE for YAML validation was a possibility but then we will have to maintain a totally new tool.

Eventually, we separated the validation code from the core code by creating separate Pydantic objects, Data Models that define the structure, types, and constraints for YAML files. This ensured that only validated, properly formatted data go to the main codebase, reducing complexities and maintaining a clear distinction between validation logic and core functionality.

Using Pydantic as a Validation Layer

A Pydantic BaseModel allows to define a type-checked data class that defines the structure and the validation requirements for data objects. A BaseModel can be populated by passing data directly to its constructor, activating the validation flow.

As an example, let’s consider the following small YAML file:

foo: foo
bar: bar
baz: True

Creating a BaseModel that mirrors the file structure would look like something like this:

from pydantic import BaseModel, Field
from yaml import safe_load


class FooBar(BaseModel):
foo: str = Field(description="This is foo")
bar: str = Field(description="This is bar")
baz: bool = Field(description="This is baz")


if __name__ == "__main__":
with open("config.yml", "r") as tables_file:
print(FooBar(**safe_load(tables_file)))

The Field object in Pydantic provides also its own validation methods and a way to provide some metadata about object fields.

The FooBar object will be created from the unpacked values from the YAML file.

python blog.py
# foo='foo' bar='bar' baz=True

So far, our current input is persistent with the BaseModel validation rules but it would be more interesting and safer to have more constraints.

Type hinting validation

One of the BaseModel internal validations flows relies on field type-hinting. In other words, if the type of the provided value for a field doesn’t match the annotation, creating the object will fail.

Let’s come back to our YAML file, one requirement from our application is that the bar field can only be equal to certain values. We can establish this by creating an Enum object and using it to annotate the bar field. BaseModel internal validation flow will pick up that annotation and make the necessary checks.

from enum import Enum

from pydantic import BaseModel, Field
from yaml import safe_load


class BarEnum(Enum):
my_bar: str = "my_bar"
your_bar: str = "your_bar"


class FooBar(BaseModel):
foo: str = Field(description="This is foo")
bar: BarEnum = Field(description="This is bar")
baz: bool = Field(description="This is baz")


if __name__ == "__main__":
with open("config.yml", "r") as tables_file:
print(FooBar(**safe_load(tables_file)))

Let’s check if our YAML file still stands.

python blog.py
# pydantic_core._pydantic_core.ValidationError: 1 validation error for FooBar
# bar
# Input should be 'my_bar' or 'your_bar' [type=enum, input_value='bar', input_type=str]

Indeed, our BaseModel now only accepts my_bar or your_bar for this field. We just need to change the bar value to my_bar and creating the model should work.

foo: foo
bar: my_bar
baz: True
python blog.py
# foo='foo' bar=<BarEnum.my_bar: 'my_bar'> baz=True

Using validators for field/model validation

A BaseModel also provides several ways to implement custom validation rules to fields and also to the entire model.

Field validator

field_validator can be used to apply some custom validations to one or several fields in a model.

Let’s say that the foo field needs to always have a prefix my_prefix. We can use the field validator method on the foo field to check if the prefix is there. Using field_validator decorator in this case can look like this.

from enum import Enum

from pydantic import BaseModel, Field, field_validator
from yaml import safe_load


class BarEnum(str, Enum):
my_bar: str = "my_bar"
your_bar: str = "your_bar"


class FooBar(BaseModel):
foo: str = Field(description="This is foo")
bar: BarEnum = Field(description="This is bar")
baz: bool = Field(description="This is baz")

@field_validator("foo")
@classmethod
def validate_foo_field(cls, foo_field: str):
if not foo_field.startswith("my_prefix"):
raise ValueError("foo field must have my_prefix as prefix")
return foo_field


if __name__ == "__main__":
with open("config.yml", "r") as tables_file:
print(FooBar(**safe_load(tables_file)))

Let’s see if the previous YAML still respects the new validation rules.

python blog.py
# pydantic_core._pydantic_core.ValidationError: 1 validation error for FooBar
# foo
# Value error, foo field must have my_prefix as prefix [type=value_error, input_value='foo', input_type=str]

As expected, Pydantic is raising an exception when it tries to create a new FooBar object. We just need to update the foo field based on the new requirement.

foo: my_prefix_foo
bar: bar
baz: True
python blog.py
# foo='my_prefix_foo' bar='bar' baz=True

Model validator

Model validator can be used to apply some custom validations to the entire model. This can be useful if there is some sort of correlation between some fields of your model or apply a custom validation to the total object.

Another requirement for our YAML file is if the bar field equals my_bar then the baz field should always be True. model_validator decorator makes validating such logic pretty straightforward.

from enum import Enum

from pydantic import BaseModel, Field, field_validator, model_validator
from yaml import safe_load


class BarEnum(str, Enum):
my_bar: str = "my_bar"
your_bar: str = "your_bar"


class FooBar(BaseModel):
foo: str = Field(description="This is foo")
bar: BarEnum = Field(description="This is bar")
baz: bool = Field(description="This is baz")

@field_validator("foo")
@classmethod
def validate_foo_field(cls, foo_field: str):
if not foo_field.startswith("my_prefix"):
raise ValueError("foo field must have my_prefix as prefix")
return foo_field

@model_validator(mode="after")
def validate_model(self):
if self.bar == BarEnum.my_bar and not self.baz:
raise ValueError("baz must be True if bar is mybar")
return self


if __name__ == "__main__":
with open("config.yml", "r") as tables_file:
print(FooBar(**safe_load(tables_file)))

The mode argument in model_validator specifies either to run the custom validations before or after BaseModel internal validation.

More and more interesting features:

In this blog post, we mainly focused on Pydantic basic validation but there are more interesting and more complex features that Pydantic offers such as Discriminated Unions, Serialization, and also Settings management.

Conclusion

Implementing a validation layer before diving into your core code is an important approach that not only saves time but also acts as a safeguard against future unwanted bugs. Embracing Pydantic as your validation layer can ensure reliability, speed, and efficiency.

--

--