Pydantic 2.0 just released! An overview of the most popular data validation python package

Learn what has changed in the recent 2.0 release of Pydantic

Logan Kilpatrick
Around the Prompt
5 min readJul 3, 2023

--

Image by Author

Chances are you have used an application that made use of Pydantic, the worlds most popular data validation framework, without even knowing it. In this post, we will cover the basics of Pydantic, what it is used for, and what has changed in the recent 2.0 release. If you are already a Pydantic user, you can skip the intro sections and head to the latter discussion of what has changed.

For full transparency, the main reason I became aware Pydantic even existed is because function calling in the OpenAI ChatCompletions API takes in JSON Schema which Pydantic outputs. This article was written as an overview for myself as I explored the framework.

What is Pydantic? A 30,000 foot overview ✈️

Pydantic was originally created in 2017 by Samuel Colvin and didn’t hit its 1.0 release until late 2019. Today, the package is being downloaded more than 70 million times a month (which makes it one of the most popular open source repos in the world) and is being used in more than 200,000 repositories on GitHub (which is the highest I have seen for any project personally).

The basic use case for Pydantic is that it allows you to validate data by defining the model. This can be particularly useful if your program expects some specific structured input or output.

Key Pydantic features include:

  • Type Hint Power: Pydantic leverages type hints for controlling schema validation and serialization, reducing the learning curve and the amount of code required.
  • Speed: The core validation logic of Pydantic, written in Rust, ranks it amongst the fastest Python data validation libraries.
  • JSON Schema: The capacity of Pydantic models to produce JSON Schema simplifies integration with a variety of tools.
  • Flexible Modes: Pydantic can operate in a strict mode, avoiding data conversion, or a lax mode, that tries to coerce data to the appropriate type.
  • Support for Standard Library Types: Pydantic is compatible with a multitude of standard library types, including dataclass and TypedDict, offering extensive validation support.
  • Customization: With custom validators and serializers, Pydantic allows users to modify how data is processed in multiple powerful ways.
  • Vibrant Ecosystem: Pydantic’s ecosystem includes around 8,000 PyPI packages, and popular libraries such as FastAPI, huggingface/transformers, Django Ninja, SQLModel, and LangChain.

Quick interruption: my brother Chandler is working on a project where he creates custom hard cover AI art coffee table books for people based on the theme they want, it is so fricken cool! Check it out to support him:

Pydantic in action 🤺

Now that we know the basics of Pydantic, let’s look at a simple example:

from datetime import datetime

from pydantic import BaseModel, PositiveInt


class User(BaseModel):
id: int
name: str = 'John Doe'
signup_ts: datetime | None
tastes: dict[str, PositiveInt]


external_data = {
'id': 123,
'signup_ts': '2019-06-01 12:22',
'tastes': {
'wine': 9,
b'cheese': 7,
'cabbage': '1',
},
}

user = User(**external_data)

In this example, we start by creating a User class which defines the different attributes we care about. In this case, and id, name, signup time signature, and tastes which is a dictionary. After we create the class, we create dictionary object (similar to a JSON object) which we pass to the User class. If all goes well, we should see something like the following:

print(user.id)  
#> 123
print(user.model_dump())
"""
{
'id': 123,
'name': 'John Doe',
'signup_ts': datetime.datetime(2019, 6, 1, 12, 22),
'tastes': {'wine': 9, 'cheese': 7, 'cabbage': 1},
}
"""

In the event that there is a validation error, we will get something like the following:

external_data = {'id': 'not an int', 'tastes': {}}  

try:
User(**external_data)
except ValidationError as e:
print(e.errors())
"""
[
{
'type': 'int_parsing',
'loc': ('id',),
'msg': 'Input should be a valid integer, unable to parse string as an integer',
'input': 'not an int',
'url': 'https://errors.pydantic.dev/2/v/int_parsing',
},
{
'type': 'missing',
'loc': ('signup_ts',),
'msg': 'Field required',
'input': {'id': 'not an int', 'tastes': {}},
'url': 'https://errors.pydantic.dev/2/v/missing',
},
]
"""

This is where we can start to see the power of Pydantic, since if things always went according to plan, everyones code would be pretty simple. First, we see the error type which tells us very clearly that we are having trouble parsing an integer field. Then, we are given a bunch of information about the location of the error, input that caused it, and a URL to learn more about the error. After that we have a second error in the list which was from a missing field.

JSON Schema 🏗️

The thing that makes Pydantic useful to me is JSON Schema output which is the required format for the function input in the update OpenAI ChatCompletions API. As of the time of this writing, the JSON Schema page has not been updated for the 2.0 release. I might try to step in and help if I can get some guidance since it is good for the ecosystem. In the meantime, you can check out the docs on JSON Schema to see if they have been updated yet.

In general, you don’t need to do anything special to make the JSON Schema feature work, just define a model the way you normally would and then call MyModel.model_json_schema() to see the valid JSON output.

What is new in V2.0? 🤔

The main thing that is new is a huge under the hood refactor which makes use of Rust for the internals of the package. This change requires some user facing code changes which are covered in the very comprehensive Pydantic migration guide.

You can also read the Pydantic 2.0 plan blog post to get a better sense of the goals of the release.

At a high level, some of the new features in 2.0 include:

  • a strict switch which can either be specified as a model or a field
  • a formalize table conversation to make it clear how data is converted in different situations
  • built in JSON support (yay)
  • validation with no model
  • and much more!

Closing thoughts 🤖

Pydantic is seriously useful, I am glad I took the time to explore it and I am still surprised as to how widely it is used given how little I have heard of it. This sets me up nicely for creating some helpful content around Pydantic and OpenAI ChatCompletions Functions. You can look forward to seeing that in our docs or a standalone tutorial soon!

--

--

Logan Kilpatrick
Around the Prompt

Lead product for Google AI Studio, working on the Gemini API, and AGI. Ex-OpenAI.