Leveraging Pydantic as a validation layer.
Ensuring clean and reliable input is crucial for building robust services. One powerful tool that simplifies this process is Pydantic, a data validation and settings management library powered by type hints. We will explore why having a validation layer, serving as a guard, before diving into a service core logic is important and how Pydantic can be a game-changer when validating interaction with external systems.
Why a validation layer matters
I was working on a service that relies on YAML configuration files provided by end users. These files act as blueprints, defining data sources and transformations within our system. Initially, without a validation layer, we encountered issues due to missing fields or incorrect data. These problems caused errors at different stages of the service runtime, resulting in unwanted outcomes.
To resolve this, we continuously added conditions and validations within the codebase to prevent these issues. While this fixed the errors we faced, it also made our codebase more complex as the validation code logic is now mixed with the core business logic. Using a tool like CUE for YAML validation was a possibility but then we will have to maintain a totally new tool.
Eventually, we separated the validation code from the core code by creating separate Pydantic objects, Data Models that define the structure, types, and constraints for YAML files. This ensured that only validated, properly formatted data go to the main codebase, reducing complexities and maintaining a clear distinction between validation logic and core functionality.
Using Pydantic as a Validation Layer
A Pydantic BaseModel allows to define a type-checked data class that defines the structure and the validation requirements for data objects. A BaseModel
can be populated by passing data directly to its constructor, activating the validation flow.
As an example, let’s consider the following small YAML file:
foo: foo
bar: bar
baz: True
Creating a BaseModel
that mirrors the file structure would look like something like this:
from pydantic import BaseModel, Field
from yaml import safe_load
class FooBar(BaseModel):
foo: str = Field(description="This is foo")
bar: str = Field(description="This is bar")
baz: bool = Field(description="This is baz")
if __name__ == "__main__":
with open("config.yml", "r") as tables_file:
print(FooBar(**safe_load(tables_file)))
The Field object in Pydantic provides also its own validation methods and a way to provide some metadata about object fields.
The FooBar
object will be created from the unpacked values from the YAML file.
python blog.py
# foo='foo' bar='bar' baz=True
So far, our current input is persistent with the BaseModel
validation rules but it would be more interesting and safer to have more constraints.
Type hinting validation
One of the BaseModel
internal validations flows relies on field type-hinting. In other words, if the type of the provided value for a field doesn’t match the annotation, creating the object will fail.
Let’s come back to our YAML file, one requirement from our application is that the bar field can only be equal to certain values. We can establish this by creating an Enum
object and using it to annotate the bar field. BaseModel internal validation flow will pick up that annotation and make the necessary checks.
from enum import Enum
from pydantic import BaseModel, Field
from yaml import safe_load
class BarEnum(Enum):
my_bar: str = "my_bar"
your_bar: str = "your_bar"
class FooBar(BaseModel):
foo: str = Field(description="This is foo")
bar: BarEnum = Field(description="This is bar")
baz: bool = Field(description="This is baz")
if __name__ == "__main__":
with open("config.yml", "r") as tables_file:
print(FooBar(**safe_load(tables_file)))
Let’s check if our YAML file still stands.
python blog.py
# pydantic_core._pydantic_core.ValidationError: 1 validation error for FooBar
# bar
# Input should be 'my_bar' or 'your_bar' [type=enum, input_value='bar', input_type=str]
Indeed, our BaseModel now only accepts my_bar
or your_bar
for this field. We just need to change the bar value to my_bar and creating the model should work.
foo: foo
bar: my_bar
baz: True
python blog.py
# foo='foo' bar=<BarEnum.my_bar: 'my_bar'> baz=True
Using validators for field/model validation
A BaseModel also provides several ways to implement custom validation rules to fields and also to the entire model.
Field validator
field_validator
can be used to apply some custom validations to one or several fields in a model.
Let’s say that the foo field needs to always have a prefix my_prefix
. We can use the field validator method on the foo
field to check if the prefix is there. Using field_validator decorator in this case can look like this.
from enum import Enum
from pydantic import BaseModel, Field, field_validator
from yaml import safe_load
class BarEnum(str, Enum):
my_bar: str = "my_bar"
your_bar: str = "your_bar"
class FooBar(BaseModel):
foo: str = Field(description="This is foo")
bar: BarEnum = Field(description="This is bar")
baz: bool = Field(description="This is baz")
@field_validator("foo")
@classmethod
def validate_foo_field(cls, foo_field: str):
if not foo_field.startswith("my_prefix"):
raise ValueError("foo field must have my_prefix as prefix")
return foo_field
if __name__ == "__main__":
with open("config.yml", "r") as tables_file:
print(FooBar(**safe_load(tables_file)))
Let’s see if the previous YAML still respects the new validation rules.
python blog.py
# pydantic_core._pydantic_core.ValidationError: 1 validation error for FooBar
# foo
# Value error, foo field must have my_prefix as prefix [type=value_error, input_value='foo', input_type=str]
As expected, Pydantic is raising an exception when it tries to create a new FooBar
object. We just need to update the foo
field based on the new requirement.
foo: my_prefix_foo
bar: bar
baz: True
python blog.py
# foo='my_prefix_foo' bar='bar' baz=True
Model validator
Model validator can be used to apply some custom validations to the entire model. This can be useful if there is some sort of correlation between some fields of your model or apply a custom validation to the total object.
Another requirement for our YAML file is if the bar
field equals my_bar
then the baz
field should always be True. model_validator
decorator makes validating such logic pretty straightforward.
from enum import Enum
from pydantic import BaseModel, Field, field_validator, model_validator
from yaml import safe_load
class BarEnum(str, Enum):
my_bar: str = "my_bar"
your_bar: str = "your_bar"
class FooBar(BaseModel):
foo: str = Field(description="This is foo")
bar: BarEnum = Field(description="This is bar")
baz: bool = Field(description="This is baz")
@field_validator("foo")
@classmethod
def validate_foo_field(cls, foo_field: str):
if not foo_field.startswith("my_prefix"):
raise ValueError("foo field must have my_prefix as prefix")
return foo_field
@model_validator(mode="after")
def validate_model(self):
if self.bar == BarEnum.my_bar and not self.baz:
raise ValueError("baz must be True if bar is mybar")
return self
if __name__ == "__main__":
with open("config.yml", "r") as tables_file:
print(FooBar(**safe_load(tables_file)))
The
mode
argument inmodel_validator
specifies either to run the custom validations before or after BaseModel internal validation.
More and more interesting features:
In this blog post, we mainly focused on Pydantic basic validation but there are more interesting and more complex features that Pydantic offers such as Discriminated Unions, Serialization, and also Settings management.
Conclusion
Implementing a validation layer before diving into your core code is an important approach that not only saves time but also acts as a safeguard against future unwanted bugs. Embracing Pydantic as your validation layer can ensure reliability, speed, and efficiency.