Data transformation with dbt and Teradata Vantage

Published in

Teradata

5 min readJan 16, 2023

dbt (Data Build Tool) is a data transformation tool that is the cornerstone of the Modern Data Stack. In this post, we will explore how to use dbt to transform data in the Teradata Vantage data warehouse.

dbt takes care of the T in ELT (Extract Load Transform). The assumption is that some other process brings raw data into your data warehouse or lake. This data then needs to be transformed to the internal representation of your choice, be it the 3rd Normal Form, Data Vault, or any other implementation you chose. Finally, the internal representation needs to be translated to the consumable format required by tools used by data consumers. It could be the dimensional model for BI tools or some other custom format, like features for AI/ML processing. With dbt, you can do all these transformations within your data warehouse or lake, using SQL to manipulate the data, git to version the transformations, and dbt data tests to verify the results of your data pipelines.

dbt has a sample project called Jaffle Shop. The project shows how raw input data: customer set, order set, and payment set, are transformed into a dimensional model that can be consumed by a BI tool.

Here is the starting point:

And here is the output model that we need:

As you can see, the customers dimension requires that we do aggregation (first_order, most_recent_order, number_of_orders, total_order_amount) and the fact table requires a join between orders and payments tables.

The original Jaffle Shop project uses Google BigQuery. I’ve forked the project to https://github.com/Teradata/jaffle_shop-dev and made a couple of changes to adjust the SQL to the Teradata syntax. I’m now going to walk you through the models (subsequent transformations) that the sample project applies.

We first build staging tables based on the source tables. In our example, this step is trivial as the raw data sets are simple. In real life, this step would usually take care of normalizing data formats, aligning naming conventions, changing data units. In our case, we select from the raw tables, e.g.:

with source as (

    {#-
    Normally we would select from the table here, but we are 
    using seeds to load our data in this project
    #}
    select * from {{ ref('raw_customers') }}

),

renamed as (

    select
        id as customer_id,
        first_name,
        last_name,
        email

    from source

)

select * from renamed

Notice, how this transformation, or model in dbt parlance, relies on common table expressions (CTE) syntax. CTE offers a clean way to decompose complex SQL. If you need a refresher on CTE see the Teradata Vantage — SQL Data Manipulation Language documentation.

Now, that we have our input data sets staged, it’s time to do joins and aggregates.

We first aggregate order data by user (we will call this model customer_orders in later steps):

with orders as (

    select * from {{ ref('stg_orders') }}

),

final as (

    select
        customer_id,

        min(order_date) as first_order,
        max(order_date) as most_recent_order,
        count(order_id) as number_of_orders
    from orders

    group by 1

)

select * from final

Join orders with payments (customer_payments):

with payments as (

    select * from {{ ref('stg_payments') }}

),

orders as (

    select * from {{ ref('stg_orders') }}

),

final as (

    select
        orders.customer_id,
        sum(amount) as total_amount

    from payments

    left join orders on payments.order_id = orders.order_id

    group by 1

)

select * from final

And finally aggregate payments by order (order_payments):



{% set payment_methods = ['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}

with payments as (

    select * from {{ ref('stg_payments') }}

),

final as (

    select
        order_id,

        {% for payment_method in payment_methods -%}
        sum(case when payment_method = '{{payment_method}}' then amount else 0 end) as {{payment_method}}_amount,
        {% endfor -%}

        sum(amount) as total_amount

    from payments

    group by 1

)

select * from final

With these intermediate models, we begin constructing the output dimensional model.

We take customers, customer_orders, and customer_payments models and construct the customers dimension:

with customers as (

    select * from {{ ref('stg_customers') }}

),

customer_orders as (

    select * from {{ ref('customer_orders') }}

),

customer_payments as (

    select * from {{ ref('customer_payments') }}

),

final as (

    select
        customers.customer_id,
        customer_orders.first_order,
        customer_orders.most_recent_order,
        customer_orders.number_of_orders,
        customer_payments.total_amount as customer_lifetime_value

    from customers

    left join customer_orders on customers.customer_id = customer_orders.customer_id
    left join customer_payments on customers.customer_id = customer_payments.customer_id

)

select * from final

Finally, we take orders and order_payments models to construct the fact table:

{% set payment_methods = ['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}

with orders as (

    select * from {{ ref('stg_orders') }}

),

order_payments as (

    select * from {{ ref('order_payments') }}

),

final as (

    select
        orders.order_id,
        orders.customer_id,
        orders.order_date,
        orders.status,

        {% for payment_method in payment_methods -%}

        order_payments.{{payment_method}}_amount,

        {% endfor -%}

        order_payments.total_amount as amount

    from orders

    left join order_payments on orders.order_id = order_payments.order_id

)

select * from final

With our models configured, let’s run dbt.

We start by loading sample data from CSV files to input tables:

dbt seed

We are now ready to execute the transformations:

dbt run

The sample project contains example tests that validate data at each step of the transformation. You can run the tests with:

dbt test

Since dbt knows about all data transformations, it can produce clean lineage documentation. Once you run these commands:

dbt docs generate
dbt docs serve

You will get documentation for your models with the following lineage graph:

In this post, we have explored what dbt is used for. We went through a sample project, that took raw input data and produced a dimensional model that is useful for BI tools.

Next steps:

For instructions on installing dbt, clone the sample project and run the steps, see the dbt with Teradata Vantage tutorial.
To learn more about the dbt Teradata adapter check out dbt-teradata plugin documentation.

Data transformation with dbt and Teradata Vantage

Written by Adam Tworkiewicz