Using Pydantic models for settings management

Yoeri van Bruchem
Sopra Steria NL Data & AI
5 min readNov 5, 2023

Abstract: The article discusses the utilization of Pydantic models for efficient settings management. It emphasizes the importance of separating sensitive environment settings from code and highlights Pydantic’s role in validating and securely managing these settings, ensuring data quality and security in development and deployment stages.

Harnessing Pydantic models for streamlined settings management

The previous article in the Pydantic series explored the benefits and applications of Pydantic models in data projects. Pydantic allows for data validation, improved readability, typing and automatic conversion, documentation generation, and integration with frameworks. The article also provides examples of creating Pydantic models and parsing API responses with Pydantic. It highlights the importance of using Pydantic models for data quality and efficiency in data projects.

Python environments

Python environments are a crucial part of Python development. They provide a way to isolate and manage Python projects and their dependencies, ensuring that different projects can have their own sets of libraries and packages without interfering with each other. This is particularly important when working on multiple projects or collaborating with others, as it helps avoid version conflicts and simplifies dependency management.

Environment and settings management

When publishing your Python code to other deployment stages in your DTAP (Development, Testing, Acceptance, Production) ecosystem, settings management and validation become more important. With settings management, you make sure that the code in a specific stage is referencing the right settings.

Assume you created the following script for loading some data from an Azure Postgres database:

from sqlalchemy import URL, text
from sqlalchemy import create_engine

url_object = URL.create(
drivername="postgresql",
username="username@dev_database",
password="pass123!",
host="dev_database.postgres.database.azure.com",
database="azure_sys",
port=5432
)

engine = create_engine(url_object)

query = text("""
SELECT *
FROM pg_catalog.pg_tables
WHERE schemaname != 'pg_catalog' AND
schemaname != 'information_schema';""")


with engine.connect() as conn:
with conn.begin():
result = conn.execute(query)


for row in result:
print(row)

# Result:
# ('query_store', 'query_texts', 'azure_superuser', None, True, False, False, False)
# ('query_store', 'runtime_stats_intervals', 'azure_superuser', None, True, False, False, False)
# ('query_store', 'runtime_stats_ws_entries', 'azure_superuser', None, True, False, False, False)
# ('query_store', 'runtime_stats_ws_values', 'azure_superuser', None, False, False, False, False)
# ('query_store', 'runtime_stats_definitions', 'azure_superuser', None, True, False, False, False)
# ('query_store', 'runtime_stats_entries', 'azure_superuser', None, True, False, False, False)
# ('query_store', 'runtime_stats_values', 'azure_superuser', None, True, False, False, False)

Step by step, this script executes the following tasks:

  1. Create a URL object containing the connection settings for the current database.
  2. Create an engine with the URL provided in the previous step.
  3. Create a query for loading some tables from the database.
  4. Using the engine, create a connection and execute the query.
  5. Print the result (all tables in the database).

As seen above, the script contains the settings (credentials) for connecting to the database. Whilst you can use this approach for developing an initial version of your code, this is not a good practice for storing and managing environment settings when publishing (packaging) and distributing your code to other ecosystems like Test, Acceptance, or Production. If somebody is able to (accidentally) obtain your script, he/she will automatically be able to access (and/or alter) your data. In order to prevent this security risk, you should, preferably from the beginning of your development stage, make sure these environment settings are stored safely (and not anywhere within your codebase).

Managing environments

The first step in improving your settings management is to separate your project settings from your project code. This enables you to exclude this information from your codebase.
Another advantage of creating separate settings files is that you'll be able to differentiate your project settings between your DTAP ecosystems.

Let's review the following snippet from the example above:

url_object = URL.create(
drivername="postgresql",
username="username@dev_database",
password="pass123!",
host="dev_database.postgres.database.azure.com",
database="azure_sys",
port=5432
)

engine = create_engine(url_object)

In our 'DTAP-ready' project setup, we want to exclude this information from our code.

One way to achieve this is by creating environment variables. These variables can be added to a .env file or by specifying them in your target environment (for example within your Kubernetes cluster).

Creating an environment file

Let's say we want to store our project settings in a file that is included in our various DTAP ecosystems. This file (save it as a .env file within your project directory) would look something like this:

DRIVERNAME=postgresql
USERNAME=username@dev_database
PASSWORD=pass123!
HOST=dev_database.postgres.database.azure.com
DATABASE=azure_sys
PORT=5432

Importing and validating environment file

Next, we would like to import this environment file. This is where the magic of Pydantic comes in.

First, we have to create a Settings class. This class inherits the Pydantic BaseSettings class and contains all the settings as attributes. Make sure you correctly set the types of these attributes.

from pydantic_settings import BaseSettings
from pydantic.types import SecretStr

class DatabaseSettings(BaseSettings):
drivername: str
username: str
password: SecretStr
host: str
database: str
port: int

my_database_settings = DatabaseSettings(_env_file=".env")

The code provided in the example above:

  • creates a DatabaseSettings class,
  • loads the .env file into a corresponding DatabaseSettings object,
  • and stores it into the my_database_settings variable.

When loading (parsing) the information from the .env file, all variables will be validated for the correct data types. Whenever we change the value of a variable to an invalid data type, Pydantic will show a ValidationError:

.env

...
PORT=5432a

Console output:

ValidationError: 1 validation error for DatabaseSettings
port
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='5432a', input_type=str]
For further information visit https://errors.pydantic.dev/2.4/v/int_parsing

Extracting settings from Pydantic object

Now we've successfully split our environment settings and secrets from our code base, it is time to extract them from our new Pydantic settings object.

url_object = URL.create(
drivername=my_database_settings.drivername,
username=my_database_settings.username,
password=my_database_settings.password.get_secret_value(),
host=my_database_settings.host,
database=my_database_settings.database,
port=my_database_settings.port
)

engine = create_engine(url_object)

Notice that for our password, we have to use the get_secret_value() method in order to obtain the value of this attribute.

Wrapping up

When we combine all the steps above, you should end up with something like the following:

from pydantic_settings import BaseSettings
from pydantic.types import SecretStr

from sqlalchemy import URL, text
from sqlalchemy import create_engine

class DatabaseSettings(BaseSettings):
drivername: str
username: str
password: SecretStr
host: str
database: str
port: int

my_database_settings = DatabaseSettings(_env_file=".env")

url_object = URL.create(
drivername=my_database_settings.drivername,
username=my_database_settings.username,
password=my_database_settings.password.get_secret_value(),
host=my_database_settings.host,
database=my_database_settings.database,
port=my_database_settings.port
)

engine = create_engine(url_object)

query = text("""
SELECT *
FROM pg_catalog.pg_tables
WHERE schemaname != 'pg_catalog' AND
schemaname != 'information_schema';""")


with engine.connect() as conn:
with conn.begin():
result = conn.execute(query)


for row in result:
print(row)

If you compare this example with the example provided at the beginning of this article, you'll see that we were able to remove the sensitive information from our code and include it in a separate environment file. When you package and distribute your code, these important environment settings won't be included in your code base. Later on, you can manually add the right environment settings by adding environment variables to all of your DTAP ecosystems. A huge advantage of using Pydantic for managing those environment variables is that:

  • Pydantic validates all environment variables for the correct data types.
  • Pydantic checks whether all environment variables are available (at the creation of the Pydantic settings object).
  • Pydantic handles secrets securely. So whenever you serialize your model or (accidentally) print secret environment attributes, it will show those values as **********
  • Pydantic enables you to add other validation constraints to your settings attributes.

--

--

Yoeri van Bruchem
Sopra Steria NL Data & AI

Yoeri van Bruchem is a data professional in the field of data engineering. In his role as an upcoming data scientist, Yoeri aims to unlock more value from data.