dbt Materialization Types and Strategies — Part 2

Seckin Dinc
8 min readDec 12, 2023

--

Photo by Ricardo Gomez Angel on Unsplash

Materializations involve the creation and storage of intermediate tables, views, or snapshots that serve as building blocks for analytical queries. By strategically leveraging these techniques, dbt significantly enhances query performance, minimizes redundant computations, and accelerates data transformation processes.

In this article, we will continue our “dbt Materialization Types and Strategies” mini series with incremental models and materialised views. For my previous materialization and snapshot articles please check the links below;

What are dbt Materializations?

dbt models are stored as materializations in databases and data warehouses. There are a total of five types of materialization that you can use in dbt;

  • View
  • Table
  • Ephemeral
  • Incremental
  • Materialized View

Materialization strategies can be defined in model or project level. The best practise is to define at project level and override it at the folder or model level if needed.

Incremental

Incremental materialization models allow you to insert or update new records into the specified table. With this method you can update the records since the last time that your table was updated.

The first time a model is run, the table is built by transforming all rows of source data. On subsequent runs, dbt transforms only the rows in your source data that you tell dbt to filter for, inserting them into the target table which is the table that has already been built.

Often, the rows you filter for on an incremental run will be the rows in your source data that have been created or updated since the last time dbt ran. As such, on each dbt run, your model gets built incrementally.

Using an incremental model limits the amount of data that needs to be transformed, vastly reducing the runtime of your transformations. This improves warehouse performance and reduces compute costs.

Pros:

  • You can significantly reduce the build time by just transforming new records
  • Efficient solution for streaming data loads and large tables

Cons:

  • Incremental models require extra configuration and are an advanced usage of dbt

When to use:

  • When you are streaming data and need to store the differences in an efficient way
  • When the table is too large to drop and create at every run

Materialized View

The materialized view concept is quite new compared to other techniques. The materialization allows the creation and maintenance of materialized views in the target database, if the database support this kind of objects.

Materialization makes use of the on_configuration_change config, which aligns with the incremental nature of the namesake database object. This setting tells dbt to attempt to make configuration changes directly to the object when possible, as opposed to completely recreating the object to implement the updated configuration.

The on_configuration_change config has three settings:

  • apply (default) — attempt to update the existing database object if possible, avoiding a complete rebuild. If any individual configuration change requires a full refresh, a full refresh be performed in lieu of individual alter statements
  • continue — allow runs to continue while also providing a warning that the object was left untouched. This could result in downstream failures as those models may depend on these unimplemented changes
  • fail — force the entire run to fail if a change is detected

Pros:

  • Materialized views combine the query performance of a table with the data freshness of a view
  • Materialized views operate much like incremental materializations, however they are usually able to be refreshed without manual interference on a regular cadence (depending on the database), forgoing the regular dbt batch refresh required with incremental materializations

Cons:

  • Not all databases or data warehouses support materialized views
  • Due to the fact that materialized views are more complex database objects

When to use:

  • Consider materialized views for use cases where incremental models are sufficient, but you would like the data platform to manage the incremental logic and refresh.

Demo

In the demo part, I will use a retail data set which contains payment transactions of different stores with different payment devices.

Image by author

Establishing Postgres Connection

“profiles” refer to configuration files that define the connection settings for your data warehouse and allow dbt to connect and interact with it. dbt uses these profiles to establish a connection to your database and execute SQL queries.

Below I updated my profiles file with the credentials according to my local Postgres instance;

dbt_demo:
outputs:
dev:
type: postgres
threads: 1
host: localhost
port: 5432
user: postgres
pass: password
dbname: postgres
schema: public
target: dev

Creating the Model

Let’s imagine that we are tasked to find out the top 10 most transacted amount stores. At the tests I will use the model below. The model below generates id and name columns. id column is primary key used for the device table and name represents the name of the store.

In these examples I will add the configurations directly in the model yml files. Below I will apply different types of materialization methods and check how they are populated;

Incremental

To use incremental models, you also need to tell dbt:

  • How to filter the rows on an incremental run
  • The unique key of the model (if any)

You’ll want to filter for “new” rows, as in, rows that have been created since the last time dbt ran this model. The best way to find the timestamp of the most recent run of this model is by checking the most recent timestamp in your target table. dbt makes it easy to query your target table by using the “{{ this }}” variable.

A unique_key enables updating existing rows instead of just appending new rows. If new information arrives for an existing unique_key, that new information can replace the current information instead of being appended to the table. If a duplicate row arrives, it can be ignored.

{{ 
config(materialized='incremental', unique_key='updated_at'
)
}}

select
x.name,
y.id
from
{{ source ('public', 'store_demo') }} x -- Source usage
left join {{ source ('public', 'device') }} y on (x.id = y.store_id)

{% if is_incremental() %}

where
y.id is not null and updated_at >= (select max(updated_at) from {{ this }})

{% endif %}

I am executing the command below just to run the intended model;

dbt run — models stores_with_devices_incremental.sql

Image by author

Model run is successful and a tableobject is created the in database.

Image by author

Without making a change in the database if we run the same model again, there will be 0 records will be inserted into the database. Let’s try;

Image by author

If we update one of the rows in the database and run the model, we should expect a different behaviour.

update store_demo  
set address = 'Still Wrong Address!', updated_at = '2025-01-01'
where address = 'Wrong Address!'
Image by author

Let’s check what is the max updated_at column in our table;

select max(updated_at)
from stores_with_devices_incremental; --2025-01-01 00:00:00.000

As you can see, the incremental model output table is also updated with the second run.

Materialized View

Materialized view is quite new to dbt. If your dbt driver is outdated, you need update your driver. For Postgres, I used the syntax below;

python3 -m pip install — upgrade dbt-postgres

I am adding {{ config(materialized=’materialized_view’, on_configuration_change = ‘apply’) }} command on top of the model yml file.

{{ 
config(materialized='materialized_view', on_configuration_change = 'apply'
)
}}

select
x.name,
y.id
from
{{ source ('public', 'store') }} x -- Source usage
left join {{ source ('public', 'device') }} y on (x.id = y.store_id)
where
y.id is not null

I am executing the command below just to run the intended model;

dbt run — models stores_with_devices

Image by author

Model run is successful and commenting out that a materialized view object is the in database.

Image by author

Without doing any changes in the tables, I am going to run the models and we will get a different output from dbt;

Image by author

Materialized views are implemented following this “drop through” life cycle:

  1. If an object does not exist, create a materialized view
  2. If an object exists, other than a materialized view, that object is dropped and replaced with a materialized view
  3. If --full-refresh is supplied, replace the materialized view regardless of changes and the on_configuration_change setting
  4. If there are no configuration changes, refresh the materialized view
  5. At this point there are configuration changes, proceed according to the on_configuration_change setting

According to Postgres documentation, materialized view cannot subsequently be directly updated and that the query used to create the materialized view is stored in exactly the same way that a view’s query is stored, so that fresh data can be generated for the materialized view with:

REFRESH MATERIALIZED VIEW mymatview;

Conclusion

At this article I walked you through Materialized View andIncrementalmaterialization techniques. Each of the methods has their pros and cons and special use cases.

If you are interested in dbt, I recommend you check out my other dbt articles.

--

--

Seckin Dinc

Building successful data teams to develop great data products