Improved data quality in your dbt models with dbt-teradata 1.7.1

Daniel Herrera
Teradata
Published in
3 min readJan 29, 2024
Image illustrative of data contracts https://unsplash.com/photos/shallow-focus-photography-of-computer-codes-BfrQnKBulYQ

Data engineering is a dynamic and rapidly changing field. New tools and enhancements to existing ones are frequently introduced, sometimes as often as weekly, to aid data engineers in providing value more swiftly and effectively.

Teradata is committed to equipping our customers, and developer community, with tools that seamlessly integrate with the modern data stack and our data platform, Teradata Vantage™.

With the introduction of dbt-teradata, our Teradata dbt connector, our team has dedicated efforts to ensure compatibility with all dbt features, achieving trusted status by dbt-labs last year. Responding to popular demand from our community, we’ve added support for dbt data contracts, a key data quality assurance tool, and Teradata session mode, the most commonly used session mode in Teradata environments, for dbt database connections. These enhancements were included in the release of db-teradata version 1.7.1 on January 11, 2024.

Data contracts in dbt

Introduced in version 1.5 of dbt in April 2023, data contracts are a crucial element for maintaining data quality. Prior to the advent of contracts, the primary means of assuring data quality in dbt was through tests. While tests are effective in verifying that a model materializes as anticipated, they are conducted after the model’s materialization, making them retrospective measures.

In contrast, contracts enable data engineers to establish a data interface that the model must adhere to. This interface’s validation occurs before the model is materialized, preventing disruptions in data pipelines caused by models that do not meet the required data structure.

From a technical standpoint, contracts, like tests, are set up as configurations in the schema.yml file associated with the model as follows:

version: 2

models:
- name: stg_customers
columns:
- name: customer_id
tests:
- unique
- not_null
- name: stg_orders
config:
contract:
enforced: true
columns:
- name: order_id
data_type: smallint
constraints:
- type: not_null
- name: customer_id
data_type: smallint
constraints:
- type: not_null
- name: order_date
data_type: date
- name: order_status
data_type: char
- name: order_duration
data_type: PERIOD(TIMESTAMP WITH TIME ZONE)

Data contracts play a crucial role in large-scale dbt implementations, particularly when adopting strategies like data mesh to foster the separation of concerns across various data domains. We will delve deeper into data mesh strategies in an upcoming blog post.

In the realm of data mesh, where distinct segments of the data pipeline are often managed through separate dbt projects, the interdependencies between these projects necessitate well-defined interfaces. Establishing clear interfaces between dependent models is essential to ensure seamless integration and maintain the integrity of the overall data architecture.

Teradata session mode in dbt database connections

The Teradata session mode is distinguished by its specialized approaches to transaction control, locking behavior, and error handling, which is very familiar to existing Teradata customers. This mode’s support within the dbt-teradata connector empowers our clients to seamlessly incorporate their tried-and-tested stored procedures into their dbt projects.

Prioritizing the integration of Teradata session mode into our connector was a top objective for our team. We are thrilled to announce the successful implementation of this feature, marking a significant milestone in our ongoing commitment to enhance user experience and compatibility within the Teradata ecosystem.

Setting up Teradata session mode is merely a matter of choosing TERA mode in the related dbt project profile.

teddy_retailers_2:
outputs:
dev:
type: teradata
host: daniel-test-wteuh1djm0q9u4x5.env.clearscape.teradata.com
user: demo_user
password: danielTest#01
logmech: TD2
schema: teddy_retailers
tmode: TERA
threads: 1
timeout_seconds: 300
priority: interactive
retries: 1
target: dev

Feedback & questions

We value your insights and perspective! Share your thoughts, feedback, and ideas in the comments below. Feel free to also explore the wealth of resources available on the Teradata Developer Portal and Teradata Developer Community.

About the author

Daniel is a builder and a problem solver, constantly fueled by the opportunity to create tools that aid individuals in extracting valuable insights from data.

Daniel has held the position of Technical Product Manager, specializing in Data Ingestion and ETL (Extract Transformation and Load) for enterprise applications. Over the past three years, he has actively contributed as a Developer, Developer Advocate, and Open-Source Contributor in the Data Engineering space for traditional and decentralized applications.

Daniel holds a certification as a Cloud Solutions Architect in Microsoft Azure. His proficiency extends to a range of programming languages including SQL, Python, JavaScript, and Solidity.

Connect with Daniel on LinkedIn.

--

--

Daniel Herrera
Teradata

I'm a Product Manager by trade, Software Developer by passion, and Problem Solver by nature. Intrigued by the idea of user ownership of data.