Cool Things You Can Do With Pydantic
Pydantic is a useful library for data parsing and validation. It coerces input types to the declared type (using type hints), accumulates all the errors using ValidationError
& it’s also well documented making it easily discoverable.
During my time using Pydantic, I picked up a few things that were not immediately obvious to me and also bumped into a few pitfalls. Since it took me a while to discover these, I figured it’s time to share them with the world.
Before we begin a few things to note:
- I’ve added numbered comments in the code (# 1, # 2, etc) which I immediately refer to after the code snippet in order to explain the code.
- Each feature/pitfall has a link in the following section so you jump directly to the ones that interest you.
So without further ado, here are the things I learned you can do with Pydantic:
- Use field aliases to play nicely with external formats
- Copy & set don’t perform type validation
- Adding constraints to models
- Enforcing models with types strictness
- Defining none key-value models
- Settings management with Literal types
- Using Pydantic to perform functions arguments validation
- Summary
Use field aliases to play nicely with external formats
When data passes through our system boundaries like external APIs, DBs, messaging queues, etc we sometimes need to follow others’ naming conventions (CamelCase vs snake_case, etc).
In some cases, it may lead to some weird Python conventions:
- “̵i̵d̵”̵ ̵i̵s̵ ̵a̵ ̵r̵e̵s̵e̵r̵v̵e̵d̵ ̵k̵e̵y̵w̵o̵r̵d̵ ̵i̵n̵ ̵P̵y̵t̵h̵o̵n̵
id
is a built-in function in Python (Jylpah correctly pointed out thatid
is a function and not a reserved keyword in Python as I initially wrote) but we must use it in order to follow an external naming convention. - In Python, we generally use snake_case as the naming convention but again, we are forced to follow a different naming convention.
- Since we want a model to represent the external data we are forced to follow different conventions.
- We may also receive data from an external source in the form of JSON or some other format.
- We are still forced to follow these external conventions.
In other cases, it may yield surprising results:
- In order to avoid using
id
as the field name (as it’s a reserved keyword), we rename our field. - Surprisingly (or at least surprising to me), Pydantic hides fields that start with an underscore (regardless of how you try to access them).
Pydantic allows us to overcome these issues with field aliases:
- This is how we declare a field alias in Pydantic. Note how the alias should match the external naming conventions.
- When creating models with aliases we pass inputs that match the aliases.
- We access the field via the field name (and not the field alias).
- Beware of trying to create these models with the actual field names (and not the aliases) as it will not work. Check out the attached Github issue to learn more about this.
- When converting our models to external formats we need to tell Pydantic to use the alias (instead of the internal name) using the
by_alias
argument name.
Copy & set don’t perform type validation
Besides passing values via the constructor, we can also pass values via copy & update or with setters (Pydantic’s models are mutable by default). These, however, have surprising behavior.
Copy & update won’t perform any type of validation. We can see that in the following example:
- Create a regular model that coerces input types.
- Copy
Pizza
with an incompatible input value. - Surprisingly, our model is copied “successfully” without any
ValidationError
being raised.
Setting a value is another example where Pydantic doesn’t perform any validations:
- Once again, Create a regular model that coerces input types.
- Set an incompatible input value to
toppings_count
. - Surprisingly, no
ValidationError
is raised when we settoppings_count
with a bad value.
Luckily, Pydantic does allow us to fairly easily overcome the aforementioned setter problem:
Config
inner class defines custom configurations on our models.- This is how we tell Pydantic to make our setters perform validations (& type coercion) on inputs.
- Pydantic now performs type coercion as we would expect (at least as I would expect).
- Incompatible types raise
ValidationError
s.
I could not find an easy way to do the same for copy & update (aside from rewriting copy
)
Adding constraints to models
It’s easy to start using Pydantic with the known type hints like str
, int
, List[str]
, etc. In many cases though, these types are not enough and we may want to further constrain the types.
Further constraining our models’ types is usually advantageous as it tends to reduce the amount of code (and conditionals), fail-fast (usually at our system’s boundaries), provides better error handling, and better reflects our domain requirements (There’s a very interesting lecture that is related to this called constraints liberate liberties constrain). Pydantic provides several options for adding constraints.
Custom Pydantic types
Pydantic ships with a few useful custom types. Some specific types are:
- URLs — input must match a URL schema. Also has a set of functions to extract the different parts of a URL.
- File paths — input must be a valid existing file.
- UUID — input must represent a valid UUID.
- Secret types — hide the values when printed or when displayed as a JSON.
- Payment card numbers — input must match a payment card number schema. Also provides a way to access the relevant parts of the number (brand, bin, etc).
- Be sure to check the documentation as there are more.
Constraint types
It’s possible to define primitive types that have more constraints on their values. These are especially useful to narrow the number of cases our systems need to deal with. Some examples are non-empty strings, non-empty lists, positive ints, a range of numbers, or a string that matches a certain regex.
Consider the following example:
constr
is a type of a constrainedstr
— thestr
must have at least 1 character.conlist
is a type of a constrainedList[int]
— the list must have at least one score.- No name validation required since
User
s are guaranteed to have at least one character in the name (assuming non-empty strings are valid names of course). - We can immediately use
max
on theUser
'sscores
, asscores
guaranteed to be non-empty — I find this the be the most interesting part when using constrained types, we avoid having to deal with many edge cases. - Pydantic verifies that
name
is at least one character long. - Pydantic verifies that
scores
are not empty. - Inputs that don’t obey the constraints causes Pydantic to raise a
ValidationError
.
The distinction between custom types & constrained types is that custom types are new types with relevant behavior (e.g. URL has a host
attribute), while constrained types are just primitive types that can only accept a subset of their inputs’ domain (e.g. PositiveInt
is just an int
that can only be instantiated from positive int
s).
Custom validators
When Pydantic’s custom types & constraint types are not enough and we need to perform more complex validation logic we can resort to Pydantic’s custom validators. These are basically custom validation functions we add to the models.
Enforcing models with types strictness
In order to explain Strict Types let’s start with 2 examples:
- We expect our users to provide a
bool
response. - There may be some ambiguity in the user’s response. The user wrote “yes” but we interpreted it as
True
(this may be the expected behavior but there’s room for error here). - Type coercion causes us to lose information causing 2 different summaries to have the same score.
These problems arise from the fact the Pydantic coerces values to the appropriate types. In most cases the type coercion is convenient but in some cases, we may wish to define stricter types that prevent the type coercion.
Here is an example of how this can be done:
- We let Pydantic know that
user_input
is a strict boolean type. - Only
True
&False
can be used as inputs foruser_input
. - Values that would usually be coerced into
bool
are no longer coerced and result in aValidationError
being raised. score
can now only receiveint
s and no other types.int
compatible types are no longer coerced and result in aValidationError
being raised.
Defining none key-value models
Most of the models we use with Pydantic (and the examples thus far) are just a bunch of key-value pairs. However, not all inputs can be represented by just key-value inputs.
Consider the following withered example:
Names
cannot represent a list ofstr
as it must be initialized with thevalues
field but the input data doesn't match this expectation — this results in aTypeError
.- Similarly,
Name
cannot be created without using thevalue
field name.
In some cases, it’s useful to define models that are just specialized representations of primitive types. These specialized types behave just like their primitive counterparts but have a different meaning to our program.
Let’s look at how we can achieve this:
__root__
is our way to tell Pydantic that our model doesn’t represent a regular key-value model.- Despite
Age
having a value of 42, it’s not equal to a regular primitive 42int
. parse_obj
is just another convenient method to parse inputs.- Pydantic maintains type coercion for custom
__root__
models.
These custom __root__
models can be useful as inputs as we can see in the following example:
age_in_days
can only focus on performing days calculation and doesn’t require any extra validation or parsing code.__root__
models perform type coercion just like any other model.- If we’re naughty and try hard enough we can obviously provide
age_in_days
a non-age value but with mypy we can at least spot some of the typing issues.
Other than what we’ve already discussed __root__
models have the following interesting consequences:
foo
expects 2 variables: 1. anint
representing anage
and a regularint
. When invokingfoo
it’s easy to accidentally pass the arguments in the wrong order.- Luckily, mypy can help spot these errors.
- Even though
Age
is defined as a custom__root__
model, when we convertPerson
to JSON,Age
behaves just like a regularint
.
So far we’ve discussed the advantages, there are, however, a few things we should consider:
- Although cool, this can easily be overused and become hard/complicated to use. Part of what makes Python so fun is it’s simplicity — be aware and try to avoid overusing this feature.
- Although premature optimization is the root of all evil — using these models in performance-critical sections may become a bottleneck (as we’re adding more objects, validations, etc). Be aware of this when aiming for performance (this is also true for “regular” Pydantic models and not just for custom
__root__
models).
Defining these custom __root__
models can be useful when used appropriately.
Settings management with Literal types
There’s another useful feature that works with __root__
but first, let’s discuss how Pydantic helps us deal with reading & parsing environment variables.
A lot of code that I’ve seen around reading & parsing applications settings suffers from 2 main problems: 1. there’s a lot of code around reading, parsing & error handling (as environment variables may be missing, misspelled, or with an incompatible value) — these usually come in the form of utility code. 2. when there are multiple errors we will usually start a highly annoying cycle of trying to read the configurations, failing on the first error (program crashes), fixing the error, repeat * N (where N is the number of configuration errors)
These are obviously very annoying, but luckily with Pydantic these problems are very easy to solve using Pydantic’s BaseSettings
. This is more or less all we need to do:
- Define a Pydantic model with all the required fields and their types.
2. Inherit from Pydantic’s BaseSettings to let it know we expect this model to be read & parsed from the environment (or a .env file, etc)
3. Create the model without any input values (values are read from the environment).
4. Since there are errors, trying to read Config
results in a ValidationError
being raised.
5. Since Pydantic accumulates errors withValidationError
we can see all the errors at once.
BaseSettings
in itself is a very useful feature but often we need to read different models (different fields & types) where each model is determined by the environment we’re running on. How can we achieve this using Pydantic?
This can be achieved by combining Literal Types, Union Types & __root__
(which we looked at previously). This is the gameplan:
- Define different configuration models (prod/staging/local etc)
- Each configuration model will also include a field
env
(I tend to call theseenv
orprofile
but you can choose whatever name you like) with a Literal type representing the name of the corresponding environment. - Define a configuration union type of all possible configuration models.
- Use
parse_obj_as
to make Pydantic read it according to the actualENV
value.
Let’s look at some code:
- We have different models for each environment we are running on — note that each model also has a corresponding Literal type.
Context
can be either aLocalContext
or aProdContext
— this is how Pydantic knows it can read one or the other and nothing else.parse_obj_as
followed by the empty dictionary, is our way to tell Pydantic to readContext
as settings. Note that sinceContext
can either beLocalContext
orProdContext
it must be of typeBaseSettings
. This means we don’t need to provide any arguments toparse_obj_as
when invoking it.local
matches the literal typelocal
and thereforeLocalContext
is read.prod
matches the literal typeprod
and thereforeProdContext
is read.- If there was a failure reading
Context
we raise aValidationError
.
Edit: I initially posted a slightly more complex version of this code but thanks to Nuno André I was able to simplify it.
Note that we obviously still need to programmatically check the env
variable to know which context we actually read (as it was determined by an environment variable) but:
- We only need to read & parse the context once from the environment as opposed to doing this in 2 steps: 1. read only the
env
variable from the environment. 2. read the rest of the environment variables according to theenv
variable. - We sometimes want to have the environment name available to us for logging, monitoring, etc so having
env
may not be redundant and can actually prove useful.
Using Pydantic to perform functions arguments validation
This feature is very new (still in beta as of the time of writing this) so make sure you read the docs before using this feature in production or rely heavily on it.
So far, we leveraged Pydantic’s ability to validate & parse arguments when we used Pydantic models. But what happens when we’ve got a function that has no Pydantic model as it’s arguments but instead only regular arguments? Can we somehow leverage Pydantic to validate these arguments?
This is where validate_arguments comes into play. It’s basically a Python decorator we can add to any function with type hints and Pydantic will validate the function arguments (works on methods too).
Let’s look at an example:
- No Pydantic model. Even though
retries
is aPositiveInt
we won’t get any validations. - An
AttributeError
is raised sinceget_payload
is passed wrong arguments. Also, note the 2nd test case where both arguments are invalid. When this happens we will only get anAttributeError
on the first argument and not on both.
We can add validations to the function by using validate_arguments
:
- In order to coerce input types or fail for invalid inputs, we need to add the
validate_arguments
decorator. - Since
validate_arguments
actually performs Pydantic validations on the input, invalid inputs are no longer allowed. - Pydantic raises a
ValidationError
on bad inputs.
Not directly related to validate_arguments
but if we’re already using Pydantic we can make the get_payload
function even better by specifying the types that we actually need like this:
url
moved fromstr
to a more specificHttpUrl
type.- Since
HttpUrl
is already a valid URL there’s no need to perform checks on it inside the function — once again, having more constrained types helped us remove complexity.
Although new, validate_arguments
seems like a really nice & useful addition to Pydantic.
Summary
Pydantic is very easy to get started with, but it’s also easy to overlook some of it’s more useful features. These features are important to know as they can help us improve our overall code quality & have better ways to handle errors — all these with relatively little effort.