dbt Materialization Types and Strategies — Part 2

8 min readDec 12, 2023

Photo by Ricardo Gomez Angel on Unsplash

Materializations involve the creation and storage of intermediate tables, views, or snapshots that serve as building blocks for analytical queries. By strategically leveraging these techniques, dbt significantly enhances query performance, minimizes redundant computations, and accelerates data transformation processes.

In this article, we will continue our “dbt Materialization Types and Strategies” mini series with incremental models and materialised views. For my previous materialization and snapshot articles please check the links below;

dbt Materialization Types and Strategies — Part 1

Data Build Tool (dbt) materialization techniques play a pivotal role in optimizing databases and data warehouses by…

medium.com

Slowly Changing Dimension (SCD) Management in dbt

In the dynamic landscape of data warehousing and business intelligence, the concept of Slowly Changing Dimensions (SCD)…

medium.com

What are dbt Materializations?

dbt models are stored as materializations in databases and data warehouses. There are a total of five types of materialization that you can use in dbt;

View
Table
Ephemeral
Incremental
Materialized View

Materialization strategies can be defined in model or project level. The best practise is to define at project level and override it at the folder or model level if needed.

Incremental

Incremental materialization models allow you to insert or update new records into the specified table. With this method you can update the records since the last time that your table was updated.

The first time a model is run, the table is built by transforming all rows of source data. On subsequent runs, dbt transforms only the rows in your source data that you tell dbt to filter for, inserting them into the target table which is the table that has already been built.

Often, the rows you filter for on an incremental run will be the rows in your source data that have been created or updated since the last time dbt ran. As such, on each dbt run, your model gets built incrementally.

Using an incremental model limits the amount of data that needs to be transformed, vastly reducing the runtime of your transformations. This improves warehouse performance and reduces compute costs.

Pros:

You can significantly reduce the build time by just transforming new records
Efficient solution for streaming data loads and large tables

Cons:

Incremental models require extra configuration and are an advanced usage of dbt

When to use:

When you are streaming data and need to store the differences in an efficient way
When the table is too large to drop and create at every run

Materialized View

The materialized view concept is quite new compared to other techniques. The materialization allows the creation and maintenance of materialized views in the target database, if the database support this kind of objects.

Materialization makes use of the on_configuration_change config, which aligns with the incremental nature of the namesake database object. This setting tells dbt to attempt to make configuration changes directly to the object when possible, as opposed to completely recreating the object to implement the updated configuration.

The on_configuration_change config has three settings:

apply (default) — attempt to update the existing database object if possible, avoiding a complete rebuild. If any individual configuration change requires a full refresh, a full refresh be performed in lieu of individual alter statements
continue — allow runs to continue while also providing a warning that the object was left untouched. This could result in downstream failures as those models may depend on these unimplemented changes
fail — force the entire run to fail if a change is detected

Pros:

Materialized views combine the query performance of a table with the data freshness of a view
Materialized views operate much like incremental materializations, however they are usually able to be refreshed without manual interference on a regular cadence (depending on the database), forgoing the regular dbt batch refresh required with incremental materializations

Cons:

Not all databases or data warehouses support materialized views
Due to the fact that materialized views are more complex database objects

When to use:

Consider materialized views for use cases where incremental models are sufficient, but you would like the data platform to manage the incremental logic and refresh.

Demo

In the demo part, I will use a retail data set which contains payment transactions of different stores with different payment devices.

Image by author

Establishing Postgres Connection

“profiles” refer to configuration files that define the connection settings for your data warehouse and allow dbt to connect and interact with it. dbt uses these profiles to establish a connection to your database and execute SQL queries.

Below I updated my profiles file with the credentials according to my local Postgres instance;

dbt_demo:
  outputs:
    dev:
          type: postgres
          threads: 1
          host: localhost
          port: 5432
          user: postgres
          pass: password
          dbname: postgres
          schema: public
      target: dev

Creating the Model

Let’s imagine that we are tasked to find out the top 10 most transacted amount stores. At the tests I will use the model below. The model below generates id and name columns. id column is primary key used for the device table and name represents the name of the store.

In these examples I will add the configurations directly in the model yml files. Below I will apply different types of materialization methods and check how they are populated;

Incremental

To use incremental models, you also need to tell dbt:

How to filter the rows on an incremental run
The unique key of the model (if any)

You’ll want to filter for “new” rows, as in, rows that have been created since the last time dbt ran this model. The best way to find the timestamp of the most recent run of this model is by checking the most recent timestamp in your target table. dbt makes it easy to query your target table by using the “{{ this }}” variable.

A unique_key enables updating existing rows instead of just appending new rows. If new information arrives for an existing unique_key, that new information can replace the current information instead of being appended to the table. If a duplicate row arrives, it can be ignored.

{{ 
config(materialized='incremental', unique_key='updated_at'
) 
}}

select
    x.name,
    y.id
from
    {{ source ('public', 'store_demo') }} x -- Source usage
    left join {{ source ('public', 'device') }} y on (x.id = y.store_id)

{% if is_incremental() %} 

where
    y.id is not null and updated_at >= (select max(updated_at) from {{ this }})

{% endif %}

I am executing the command below just to run the intended model;

dbt run — models stores_with_devices_incremental.sql

Model run is successful and a tableobject is created the in database.

Without making a change in the database if we run the same model again, there will be 0 records will be inserted into the database. Let’s try;

If we update one of the rows in the database and run the model, we should expect a different behaviour.

update store_demo  
set address = 'Still Wrong Address!', updated_at = '2025-01-01'  
where address = 'Wrong Address!'

Let’s check what is the max updated_at column in our table;

select max(updated_at)
from stores_with_devices_incremental; --2025-01-01 00:00:00.000

As you can see, the incremental model output table is also updated with the second run.

Materialized View

Materialized view is quite new to dbt. If your dbt driver is outdated, you need update your driver. For Postgres, I used the syntax below;

python3 -m pip install — upgrade dbt-postgres

I am adding {{ config(materialized=’materialized_view’, on_configuration_change = ‘apply’) }} command on top of the model yml file.

{{ 
config(materialized='materialized_view', on_configuration_change = 'apply'
) 
}}

select
    x.name,
    y.id
from
    {{ source ('public', 'store') }} x -- Source usage
    left join {{ source ('public', 'device') }} y on (x.id = y.store_id)
where
    y.id is not null

I am executing the command below just to run the intended model;

dbt run — models stores_with_devices

Model run is successful and commenting out that a materialized view object is the in database.

Without doing any changes in the tables, I am going to run the models and we will get a different output from dbt;

Materialized views are implemented following this “drop through” life cycle:

If an object does not exist, create a materialized view
If an object exists, other than a materialized view, that object is dropped and replaced with a materialized view
If --full-refresh is supplied, replace the materialized view regardless of changes and the on_configuration_change setting
If there are no configuration changes, refresh the materialized view
At this point there are configuration changes, proceed according to the on_configuration_change setting

According to Postgres documentation, materialized view cannot subsequently be directly updated and that the query used to create the materialized view is stored in exactly the same way that a view’s query is stored, so that fresh data can be generated for the materialized view with:

REFRESH MATERIALIZED VIEW mymatview;

Conclusion

At this article I walked you through Materialized View andIncrementalmaterialization techniques. Each of the methods has their pros and cons and special use cases.

If you are interested in dbt, I recommend you check out my other dbt articles.

Introduction to Data Build Tool (dbt)

Once upon a time, in the mystical realm of data management, two powerful forces clashed in an epic battle: ELT and ETL…

medium.com

Minimise Upstream Dependencies with dbt Sources

Raw data comes in different forms, frequencies and sizes from various resources into our databases and data warehouses…

medium.com

How to Overcome Firefighting with dbt Core Testing Features

In the dynamic landscape of data analytics and warehousing, the success of any project hinges on the reliability and…

medium.com

Turbocharge Company-Wide Modelling Evolution with dbt Documentation

In the ever-evolving data management landscape, technical documentation has often been the unsung hero, residing in the…

medium.com

Slowly Changing Dimension (SCD) Management in dbt

In the dynamic landscape of data warehousing and business intelligence, the concept of Slowly Changing Dimensions (SCD)…

medium.com

dbt Materialization Types and Strategies — Part 1

Data Build Tool (dbt) materialization techniques play a pivotal role in optimizing databases and data warehouses by…

medium.com

dbt Materialization Types and Strategies — Part 2

dbt Materialization Types and Strategies — Part 1

Data Build Tool (dbt) materialization techniques play a pivotal role in optimizing databases and data warehouses by…

Slowly Changing Dimension (SCD) Management in dbt

In the dynamic landscape of data warehousing and business intelligence, the concept of Slowly Changing Dimensions (SCD)…

What are dbt Materializations?

Incremental

Materialized View

Demo

Establishing Postgres Connection

Creating the Model

Incremental

Materialized View

Conclusion

Introduction to Data Build Tool (dbt)

Once upon a time, in the mystical realm of data management, two powerful forces clashed in an epic battle: ELT and ETL…

Minimise Upstream Dependencies with dbt Sources

Raw data comes in different forms, frequencies and sizes from various resources into our databases and data warehouses…

How to Overcome Firefighting with dbt Core Testing Features

In the dynamic landscape of data analytics and warehousing, the success of any project hinges on the reliability and…

Turbocharge Company-Wide Modelling Evolution with dbt Documentation

In the ever-evolving data management landscape, technical documentation has often been the unsung hero, residing in the…

Slowly Changing Dimension (SCD) Management in dbt

In the dynamic landscape of data warehousing and business intelligence, the concept of Slowly Changing Dimensions (SCD)…

dbt Materialization Types and Strategies — Part 1

Data Build Tool (dbt) materialization techniques play a pivotal role in optimizing databases and data warehouses by…

Written by Seckin Dinc