Pydantic: The package for Data Validation and Modeling like a PRO

Daniel Wu
7 min readMar 1, 2023

--

Introduction

Pydantic is a Python package that provides data validation and settings management functionality. It is built on top of Python's typing module, which allows you to specify the types of data that you expect. Pydantic is designed to be lightweight and extensible, making it a popular choice for building APIs and microservices. In this article, we'll introduce Pydantic and provide ten examples of how it can be used.

Example 1: Validating primitive data types

Pydantic can be used to validate primitive data types such as strings, integers, and floats. In the following example, we define a Pydantic model that represents a person's name, age, and height:

from pydantic import BaseModel

class Person(BaseModel):
name: str
age: int
height: float

We can then create instances of the Person model and validate the data that we pass in:

person = Person(name="John Smith", age=30, height=1.8)
print(person)

# Output:
# Person name='John Smith' age=30 height=1.8

If we try to pass in data that does not conform to the expected types, Pydantic will raise a validation error:

person = Person(name="John Smith", age="thirty", height=1.8)

# Output:
# pydantic.error_wrappers.ValidationError: 1 validation error for Person
# age
# value is not a valid integer (type=type_error.integer)

Example 2: Validating nested data structures

Pydantic can also be used to validate more complex data structures, such as nested dictionaries. In the following example, we define a Pydantic model that represents a customer's contact information:

from typing import Dict

class Contact(BaseModel):
email: str
phone: str

class Customer(BaseModel):
name: str
contact: Contact

We can then create instances of the Customer model and validate the data that we pass in:

customer_data = {
"name": "John Smith",
"contact": {
"email": "john.smith@example.com",
"phone": "555-1234"
}
}

customer = Customer(**customer_data)
print(customer)

# Output:
# Customer name='John Smith' contact=Contact email='john.smith@example.com' phone='555-1234'

If we try to pass in data that does not conform to the expected structure, Pydantic will raise a validation error:

customer_data = {
"name": "John Smith",
"contact": {
"email": "john.smith@example.com",
"mobile": "555-1234"
}
}

customer = Customer(**customer_data)

# Output:
# pydantic.error_wrappers.ValidationError: 1 validation error for Customer
# contact
# field required (type=value_error.missing)

Example 3: Validating data from external sources

Pydantic can be used to validate data that comes from external sources such as JSON, YAML, or CSV files. In the following example, we define a Pydantic model that represents a product's name, price, and quantity:

from pydantic import parse_obj_as

class Product(BaseModel):
name: str
price: float
quantity: int

We can then load data from a JSON file and validate it using Pydantic:

import json

with open("products.json", "r") as f:
data = json.load(f)

products = parse_obj_as(List[Product,products], data)
print(products)
#Output:
# [
# Product name='Apple' price=0.5 quantity=100,
# Product name='Banana' price=0.25 quantity=200,
# Product name='Orange' price=0.75 quantity=50
# ]

If the data in the file does not come from to the expected structure, Pydantic will raise a validation error:



import json

with open("products.json", "r") as f:
data = json.load(f)

data.append({"name": "Pineapple", "price": "1.25", "quantity": "25"})

products = parse_obj_as(List[Product], data)

# Output:
# pydantic.error_wrappers.ValidationError: 1 validation error for Product
# price
# value is not a valid float (type=type_error.float)

Example 4: Using validators

Pydantic provides a way to define validators that can be used to check data after it has been validated. In the following example, we define a Pydantic model that represents a user's password:

from pydantic import validator

class Password(BaseModel):
value: str

@validator("value")
def validate_password(cls, v):
if len(v) < 8:
raise ValueError("Password must be at least 8 characters long")
return v

We can then create instances of the Password model and validate the data that we pass in:

password = Password(value="password123")
print(password)

# Output:
# Password value='password123'

If we try to pass in a password that is too short, Pydantic will raise a validation error:

password = Password(value="pass")
print(password)

# Output:
# ValueError: Password must be at least 8 characters long

Example 5: Using custom data types

Pydantic allows you to define custom data types that can be used in your models. In the following example, we define a custom data type for a UUID:

import uuid

class UUID(str):
@classmethod
def __get_validators__(cls):
yield cls.validate

@classmethod
def validate(cls, value):
try:
return str(uuid.UUID(value))
except ValueError as e:
raise ValueError("Invalid UUID") from e

We can then use this custom data type in our Pydantic models:

class Product(BaseModel):
id: UUID
name: str
price: float

We can then create instances of the Product model and validate the data that we pass in:

product = Product(id="123e4567-e89b-12d3-a456-426655440000", name="Apple", price=0.5)
print(product)

# Output:
# Product id='123e4567-e89b-12d3-a456-426655440000' name='Apple' price=0.5

If we try to pass in an invalid UUID, Pydantic will raise a validation error:

product = Product(id="invalid-uuid", name="Apple", price=0.5)

# Output:
# pydantic.error_wrappers.ValidationError: 1 validation error for Product
# id
# Invalid UUID (type=value_error)

Example 6: Using configuration options

Pydantic allows you to configure various options for your models, such as the way that data is parsed and formatted. In the following example, we define a Pydantic model that represents a person's name and birth date, and configure it to parse the birth date as a datetime object:

from datetime import datetime

class Person(BaseModel):
name: str
birth_date: datetime

class Config:
anystr_strip_whitespace = True
json_encoders = {datetime: lambda dt: dt.isoformat()}
datetime_parse_format = "%Y-%m-%d"

We can then create instances of the Person model and validate the data that we pass in:

data = {"name": "Alice", "birth_date": "1990-01-01"}
person = Person.parse_obj(data)
print(person)

# Output:
# Person name='Alice' birth_date=datetime.datetime(1990, 1, 1, 0, 0)

We can see that the birth date has been parsed as a datetime object. We can also configure Pydantic to format the birth date as an ISO 8601 string when the model is serialized to JSON:

json_str = person.json()
print(json_str)

# Output:
# {"name": "Alice", "birth_date": "1990-01-01T00:00:00"}

# Alternatively, we can configure Pydantic to format the birth date as a Unix timestamp:
class Person(BaseModel):
name: str
birth_date: datetime

class Config:
json_encoders = {datetime: lambda dt: dt.timestamp()}

data = {"name": "Alice", "birth_date": datetime(1990, 1, 1)}
person = Person.parse_obj(data)
json_str = person.json()
print(json_str)

# Output:
# {"name": "Alice", "birth_date": 631152000.0}

Example 7: Using inheritance

Pydantic allows you to define models that inherit from other models. In the following example, we define a Pydantic model for a user and a Pydantic model for an admin that inherits from the user model:

class User(BaseModel):
name: str
email: str

class Admin(User):
is_admin: bool

We can then create instances of the User and Admin models:

user = User(name="Alice", email="alice@example.com")
admin = Admin(name="Bob", email="bob@example.com", is_admin=True)

print(user)
print(admin)

# Output:
# User name='Alice' email='alice@example.com'
# Admin name='Bob' email='bob@example.com' is_admin=True

Example 8: Using nested models

Pydantic allows you to define models that contain other models as fields. In the following example, we define a Pydantic model for a store that contains a list of products:

class Store(BaseModel):
name: str
products: List[Product]

We can then create instances of the Store model and validate the data that we pass in:

data = {
"name": "My Store",
"products": [
{"name": "Apple", "price": 0.5, "quantity": 100},
{"name": "Banana", "price": 0.25, "quantity": 200},
{"name": "Orange", "price": 0.75, "quantity": 50},
]
}

store = Store.parse_obj(data)
print(store)

# Output:
# Store name='My Store' products=[
# Product name='Apple' price=0.5 quantity=100,
# Product name='Banana' price=0.25 quantity=200,
# Product name='Orange' price=0.75 quantity

Example 9: Using validators

Pydantic allows you to define validators for your model fields. Validators are functions that take a value and return the same value if it's valid, or raise a ValueError if it's invalid. In the following example, we define a Pydantic model for a bank account that validates the account number:

class BankAccount(BaseModel):
account_number: str

@validator("account_number")
def validate_account_number(cls, v):
if not v.isdigit():
raise ValueError("Account number must be all digits")
if len(v) != 10:
raise ValueError("Account number must be 10 digits")
return v

We can then create instances of the BankAccount model and validate the data that we pass in:

data = {"account_number": "1234567890"}
account = BankAccount.parse_obj(data)
print(account)

# Output:
# BankAccount account_number='1234567890'

data = {"account_number": "abcdefghij"}
try:
account = BankAccount.parse_obj(data)
except ValueError as e:
print(e)

# Output:
# Account number must be all digits

Example 10: Using custom root types

Pydantic allows you to define custom root types, which are types that can be used as the root object for parsing and validation. In the following example, we define a custom root type for a point in 2D space:

class Point:
def __init__(self, x: float, y: float):
self.x = x
self.y = y

We can then define a Pydantic model for a line segment that contains two points:

class LineSegment(BaseModel):
start: Point
end: Point

We can then create instances of the LineSegment model and validate the data that we pass in:

data = {
"start": {"x": 0.0, "y": 0.0},
"end": {"x": 1.0, "y": 1.0}
}
line_segment = LineSegment.parse_obj(data)
print(line_segment)

# Output:
# LineSegment start=Point(x=0.0, y=0.0) end=Point(x=1.0, y=1.0)

Conclusion

Pydantic is a powerful Python package that allows you to easily define and validate data models. With Pydantic, you can write concise and readable code that is also robust and maintainable. By using Pydantic, you can reduce the time and effort required to validate and sanitize user input, and make your code more resilient to errors and bugs. Whether you're working on a web application, a data processing pipeline, or any other type of software, Pydantic is a valuable tool to have in your toolkit.

Please buy me a coffee

https://www.buymeacoffee.com/danielwu

--

--

Daniel Wu

I have more than 7 years working as researchers in DataScience and Financial industry. Good programming skills in C++ and Python