Pydantic: Usage and its Applications
Pydantic is a Python library for data validation and settings management that’s based on Python type hints. It allows you to define how data should be in pure, canonical Python 3.7+ and validate it with Pydantic. Pydantic is fast, extensible, and plays nicely with your linters/IDE/brain.
In this blog post, we will explore what Pydantic is, how to use it, and what are some of its applications.
What is Pydantic?
Pydantic is a library that leverages the power of Python type hints to perform data validation and serialization. It can also generate JSON schemas from your models, allowing for easy integration with other tools.
Pydantic has many advantages, such as:
- Type safety: Pydantic enforces type hints at runtime, ensuring that your data conforms to the expected types and formats. It also supports custom data types, such as enums, UUIDs, IP addresses, etc.
- User-friendly errors: Pydantic provides informative and readable error messages when validation fails, including the location, type, and input of the error. It also provides links to the documentation for each error type.
- Performance: Pydantic’s core validation logic is written in Rust, making it one of the fastest data validation libraries for Python. It also supports lazy validation and caching for improved efficiency.
- Ease of use: Pydantic is simple and intuitive to use, requiring minimal boilerplate code and configuration. It works well with many popular IDEs and static analysis tools, such as PyCharm, VS Code, mypy, etc.
How to use Pydantic?
To use Pydantic, you need to install it using pip install pydantic
or conda install pydantic -c conda-forge
.
The main way to use Pydantic is to create custom classes that inherit from BaseModel
, which is the base class for all Pydantic models. You can then define the attributes of your model using type annotations, and optionally provide default values or validators.
For example, let’s create a simple model for a user:
from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str = 'John Doe'
signup_ts: Optional[datetime] = None
friends: List[int] = []
This model defines four attributes: id
, name
, signup_ts
, and friends
. The id
attribute is required and must be an integer. The name
attribute has a default value of 'John Doe'
and must be a string. The signup_ts
attribute is optional and can be either a datetime object or None. The friends
attribute has an empty list as the default value and must be a list of integers.
To create an instance of this model, you can pass a dictionary of values to the constructor:
external_data = {
'id': 123,
'signup_ts': '2019-06-01 12:22',
'friends': [1, '2', b'3'],
}
user = User(**external_data)
Pydantic will automatically validate and parse the input data according to the type annotations. It will also coerce the data to the correct type where appropriate. For example, it will convert the string '2019-06-01 12:22'
to a datetime object, and the bytes b'3'
to an integer.
You can access the attributes of the model as normal:
print(user.id) # 123
print(user.name) # John Doe
print(user.signup_ts) # 2019-06-01 12:22:00
print(user.friends) # [1, 2, 3]
Or as a JSON string using the json()
method:
print(user.json())
# {"id": 123, "name": "John Doe", "signup_ts": "2019-06-01T12:22:00", "friends": [1, 2, 3]}
If the input data is invalid or missing some required values, Pydantic will raise a ValidationError
with a detailed breakdown of what went wrong:
external_data = {
'id': 'not an int',
'tastes': {},
}
try:
user = User(**external_data)
except ValidationError as e:
print(e.errors())
# [
# {
# 'type': 'int_parsing',
# 'loc': ('id',),
# 'msg': 'Input should be a valid integer, unable to parse string as an integer',
# 'input': 'not an int',
# 'url': 'https://errors.pydantic.dev/2/v/int_parsing',
# },
# {
# 'type': 'missing',
# 'loc': ('signup_ts',),
# 'msg': 'Field required',
# 'input': {'id': 'not an int', 'tastes': {}},
# 'url': 'https://errors.pydantic.dev/2/v/missing',
# },
# ]
What are some applications of Pydantic?
Pydantic can be used for various applications that involve data validation and serialization, such as:
- Web development: Pydantic can be used to validate and parse requests and responses in web frameworks, such as FastAPI, Django Ninja, and Starlette. It can also generate JSON schemas and OpenAPI specifications from your models, allowing for easy documentation and testing of your APIs.
- Data analysis: Pydantic can be used to validate and clean data from various sources, such as CSV files, databases, or web scraping. It can also help with data exploration and visualization by providing handy methods and attributes, such as
model_dump()
,schema()
,fields
, etc. - Machine learning: Pydantic can be used to validate and serialize machine learning models and data, such as TensorFlow models, Scikit-learn estimators, or PyTorch tensors. It can also help with model deployment and inference by providing methods for exporting and importing models as JSON or pickle files.
- Configuration management: Pydantic can be used to manage settings and configuration files for your applications, such as environment variables, INI files, or YAML files. It can also help with dynamic reloading and validation of your settings using the
BaseSettings
class.
These are just some of the examples of how Pydantic can be used in different domains and scenarios. Pydantic is a versatile and powerful library that can help you with any data-related task in Python.
Alternatives to Pydantic
Some alternatives to Pydantic are:
- marshmallow: marshmallow is a library for data serialization and validation that uses schemas to define how data should be converted and validated. It supports various formats, such as JSON, XML, YAML, etc. It also integrates well with popular web frameworks, such as Flask, Django, and Starlette1.
- class-validator: class-validator is a library for data validation and transformation that uses decorators to define validation rules and constraints on class properties. It also works with the class-transformer library to perform data serialization and deserialization2.
- Cerberus: Cerberus is a library for data validation that uses schemas to define the rules and types of the data. It can handle nested structures, custom validators, coercion, normalization, etc. It also supports various formats, such as JSON, YAML, CSV, etc3.