Data transformation with dbt and Teradata Vantage

Adam Tworkiewicz
Teradata
Published in
5 min readJan 16, 2023

dbt (Data Build Tool) is a data transformation tool that is the cornerstone of the Modern Data Stack. In this post, we will explore how to use dbt to transform data in the Teradata Vantage data warehouse.

dbt takes care of the T in ELT (Extract Load Transform). The assumption is that some other process brings raw data into your data warehouse or lake. This data then needs to be transformed to the internal representation of your choice, be it the 3rd Normal Form, Data Vault, or any other implementation you chose. Finally, the internal representation needs to be translated to the consumable format required by tools used by data consumers. It could be the dimensional model for BI tools or some other custom format, like features for AI/ML processing. With dbt, you can do all these transformations within your data warehouse or lake, using SQL to manipulate the data, git to version the transformations, and dbt data tests to verify the results of your data pipelines.

dbt has a sample project called Jaffle Shop. The project shows how raw input data: customer set, order set, and payment set, are transformed into a dimensional model that can be consumed by a BI tool.

Here is the starting point:

Input data sets

And here is the output model that we need:

Required resultant dimensional model

As you can see, the customers dimension requires that we do aggregation (first_order, most_recent_order, number_of_orders, total_order_amount) and the fact table requires a join between orders and payments tables.

The original Jaffle Shop project uses Google BigQuery. I’ve forked the project to https://github.com/Teradata/jaffle_shop-dev and made a couple of changes to adjust the SQL to the Teradata syntax. I’m now going to walk you through the models (subsequent transformations) that the sample project applies.

We first build staging tables based on the source tables. In our example, this step is trivial as the raw data sets are simple. In real life, this step would usually take care of normalizing data formats, aligning naming conventions, changing data units. In our case, we select from the raw tables, e.g.:

with source as (

{#-
Normally we would select from the table here, but we are
using seeds to load our data in this project
#}
select * from {{ ref('raw_customers') }}

),

renamed as (

select
id as customer_id,
first_name,
last_name,
email

from source

)

select * from renamed

Notice, how this transformation, or model in dbt parlance, relies on common table expressions (CTE) syntax. CTE offers a clean way to decompose complex SQL. If you need a refresher on CTE see the Teradata Vantage — SQL Data Manipulation Language documentation.

Now, that we have our input data sets staged, it’s time to do joins and aggregates.

We first aggregate order data by user (we will call this model customer_orders in later steps):

with orders as (

select * from {{ ref('stg_orders') }}

),

final as (

select
customer_id,

min(order_date) as first_order,
max(order_date) as most_recent_order,
count(order_id) as number_of_orders
from orders

group by 1

)

select * from final

Join orders with payments (customer_payments):

with payments as (

select * from {{ ref('stg_payments') }}

),

orders as (

select * from {{ ref('stg_orders') }}

),

final as (

select
orders.customer_id,
sum(amount) as total_amount

from payments

left join orders on payments.order_id = orders.order_id

group by 1

)

select * from final

And finally aggregate payments by order (order_payments):



{% set payment_methods = ['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}

with payments as (

select * from {{ ref('stg_payments') }}

),

final as (

select
order_id,

{% for payment_method in payment_methods -%}
sum(case when payment_method = '{{payment_method}}' then amount else 0 end) as {{payment_method}}_amount,
{% endfor -%}

sum(amount) as total_amount

from payments

group by 1

)

select * from final

With these intermediate models, we begin constructing the output dimensional model.

We take customers, customer_orders, and customer_payments models and construct the customers dimension:

with customers as (

select * from {{ ref('stg_customers') }}

),

customer_orders as (

select * from {{ ref('customer_orders') }}

),

customer_payments as (

select * from {{ ref('customer_payments') }}

),

final as (

select
customers.customer_id,
customer_orders.first_order,
customer_orders.most_recent_order,
customer_orders.number_of_orders,
customer_payments.total_amount as customer_lifetime_value

from customers

left join customer_orders on customers.customer_id = customer_orders.customer_id
left join customer_payments on customers.customer_id = customer_payments.customer_id

)

select * from final

Finally, we take orders and order_payments models to construct the fact table:

{% set payment_methods = ['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}

with orders as (

select * from {{ ref('stg_orders') }}

),

order_payments as (

select * from {{ ref('order_payments') }}

),

final as (

select
orders.order_id,
orders.customer_id,
orders.order_date,
orders.status,

{% for payment_method in payment_methods -%}

order_payments.{{payment_method}}_amount,

{% endfor -%}

order_payments.total_amount as amount

from orders

left join order_payments on orders.order_id = order_payments.order_id

)

select * from final

With our models configured, let’s run dbt.

We start by loading sample data from CSV files to input tables:

dbt seed

We are now ready to execute the transformations:

dbt run

The sample project contains example tests that validate data at each step of the transformation. You can run the tests with:

dbt test

Since dbt knows about all data transformations, it can produce clean lineage documentation. Once you run these commands:

dbt docs generate
dbt docs serve

You will get documentation for your models with the following lineage graph:

data lineage diagram produced by dbt

In this post, we have explored what dbt is used for. We went through a sample project, that took raw input data and produced a dimensional model that is useful for BI tools.

Next steps:

--

--