Don’t Do Analytics Engineering in Snowflake Until You Read This (Hint: dbt)

Using dbt To Create Tables Using Custom Materialization

Hashmap, an NTT DATA Company

Published in

Hashmap, an NTT DATA Company

8 min readOct 10, 2019

by Venkatesh Sekar

Imagine you had an Analytics Engineering solution (think CI/CD for database objects) that worked with Snowflake Cloud Data Warehouse and is…

Open-source
Easy to understand and learn if you are SQL savvy ~ 3 days
Git versionable
Designed with visual lineage in mind
A great way for your analytics teams to get better visibility into data pipelines

Well…it’s here and it’s called dbt!

As I have continued to work with numerous customers across our Snowflake client base, the need for a SQL-centric data transformation and data ops solution has come up time and again and dbt has really stepped up to take on the challenge.

I’m going to take you through a great use case for dbt and show you how to create tables using custom materialization with Snowflake’s Cloud Data Warehouse.

Notably, if you haven’t tried out Snowflake yet and are looking to leverage the cloud for your data warehouse or moving an existing on-premise data warehouse to the cloud (Netezza, Teradata, Exadata, etc.), you should hands-down try out Snowflake — it just works.

So let’s first dive into dbt and then focus on our use case.

What is dbt?

dbt is a command line tool based on SQL and is primarily used by analysts to do data transformations. In other words, it does the ‘T’ in ELT.

It facilitates writing modular SELECT SQLs and takes care of dependencies, compilation, and materialization in run time.

If you are interested in learning more about dbt here are some quick links on the getdbt site:

Read the dbt viewpoint
Overview on dbt

Source tables in dbt

In dbt, a “Source Table” holds data on which data transformations are done. The transformations are SELECT SQL statements which are joined together and then materialized into tables.

When developing scripts or models with dbt, the main statements are SELECT SQL dialect. There are no create or replace statements written in model statements. This means that dbt does not offer methods for issuing CREATE TABLE statements which can be used for source tables. It’s up to the user to define these outside of dbt.

Macros in dbt

But if you look closely at how dbt offers customization or enhancement to be developed using Macros, you will realize that these are pretty much Jinja templates.

And if you search around the code base in github, you will come across common macros such as:

create_view_as
get_columns_in_relation
drop_relation_if_exists
alter_column_type
truncate_relation

Materialization in dbt

Materializations are strategies for persisting dbt models in a warehouse such as Snowflake. There are four types of materializations built into dbt. They are:

table
view
incremental
ephemeral

dbt also offers the capability to develop Custom Materializations as well. Knowing this, it led me to investigate if I could use this functionality to develop a custom materialization that would do the following:

Process a model file which contains a “CREATE TABLE” statement
Identify if a column has been added/updated/dropped in the definition and issue an alter statement accordingly
A complete, full refresh of the data
Backup the table before doing any modifications
Migrate the data after the table has been modified

As mentioned, I’m using Snowflake as the database of choice for this example, but dbt does work with other databases as well — so let’s get going on our example.

Defined Macros

I have defined the following macros in snowflake_helper_macros

Persistent Table Materialization

I have defined the custom materialization persistent_table_materialization to handle the above-defined needs. In short, the implementation has the following logic:

Our dbt Model

Below is an example of the model file which now could be materialized by dbt. The example is here CONTACT

{{ config(materialized=’persistent_table’ ,retain_previous_version_flg=false ,migrate_data_over_flg=true )}}CREATE OR REPLACE TABLE “{{ database }}”.”{{ schema }}”.”CONTACT” (FIRST_NAME VARCHAR(100),LAST_NAME VARCHAR(100),EMAIL VARCHAR(100),STREETADDRESS VARCHAR(100),CITY VARCHAR(100));

Walking Through dbt Execution Examples

To see this materialization in action, here is a walkthrough with screenshots on the various facilities.

full-refresh

Let’s start off with no tables defined in Snowflake. A “full-refresh” flag would mean to create the table as if nothing existed. Should the table exist, it will recreate the table (due to ‘CREATE OR REPLACE’ in the model).

dbt -d run -m CONTACT — full-refresh

The table is now created and I have inserted some sample records manually. Here is the screenshot:

full-refresh with Migrate Data Enabled

Let’s do a full-refresh with the ‘migrate_data_over_flg’ set to true

config(materialized=’persistent_table’ ,retain_previous_version_flg=false ,migrate_data_over_flg=true )

Here is the command you want to issue:

dbt -d run -m CONTACT — full-refresh

Again, the table is recreated and I have inserted some sample records manually.

Backup Previous Version

Let’s go through an example of how to retain the previous copy and see what happens after migration.

The screenshot below reflects the CONTACT table as in INFORMATION_SCHEMA.TABLES before refresh:

For this we set the flag ‘retain_previous_version_flg’

config(materialized=’persistent_table’ ,retain_previous_version_flg=true ,migrate_data_over_flg=true )

we issue the command to do a full-refresh as usual

dbt -d run -m CONTACT — full-refresh

The screenshot below reflects the various CONTACT tables in Snowflake as in INFORMATION_SCHEMA.TABLES after refresh:

dbt backed up the table ‘CONTACT_DBT_BACKUP_20191006125145387106’ and also retained the rows (look at row count). Due to the ‘migrate_data_over_flg’ it has also migrated over the previous set of data.

Add column

Now I want to add a column ‘LAST_UPDATED’ to the definition.

CREATE OR REPLACE TABLE "{{ database }}"."{{ schema }}"."CONTACT" (FIRST_NAME VARCHAR(100),LAST_NAME VARCHAR(100),EMAIL VARCHAR(100),STREETADDRESS VARCHAR(100),CITY VARCHAR(100),LAST_UPDATED DATE)

Hmm — were you able to see that I’m not doing an ‘ALTER TABLE’ statement. Through dbt you will see this happen. Issue the command and do not set the full-refresh flag.

dbt -d run -m CONTACT

This results in dbt issuing an ‘ALTER TABLE’, as in the log below:

The screenshot reflects the Snowflake table structure after this update:

Also, note that the existing records are not deleted as this was an alter statement.

Drop column

Now let’s remove the ‘LAST_UPDATED’ column.

CREATE OR REPLACE TABLE “{{ database }}”.”{{ schema }}”.”CONTACT” (FIRST_NAME VARCHAR(100),LAST_NAME VARCHAR(100),EMAIL VARCHAR(100),STREETADDRESS VARCHAR(100),CITY VARCHAR(100))

Same thing — I am not doing an ‘ALTER TABLE’ statement.

dbt -d run -m CONTACT

As before, dbt issues an ‘ALTER TABLE’, as you can see in the log below:

The database structure in Snowflake in the screenshot below:

Are There Any Limitations To Be Aware Of?

If you are using this approach with dbt, keep the following points in mind:

You cannot use this model in a ‘source’ or a ‘ref’ call
Do not ask dbt to do a ‘run’ across all the entire model as this could result in recreating the tables accidentally, although you’d have a backup if the flags were set

Where Should You Go From Here

With the capability that dbt brings to the table for creating tables using custom materialization, I feel very good in recommending dbt for your CI/CD pipeline database objects. Also, you can access everything in my git repo here.

You should also check out John Aven’s recent blog post (a fellow Hashmapper) on Using DBT to Execute ELT Pipelines in Snowflake.

If you use Snowflake today, it would be great to hear about the approaches that you have taken for Data Transformation and DataOps along with the challenges that you are addressing.

Need Snowflake Cloud Data Warehousing and Migration Assistance?

If you’d like additional assistance in this area, Hashmap offers a range of enablement workshops and consulting service packages as part of our consulting service offerings, and would be glad to work through your specifics in this area.

How does Snowflake compare to other data warehouses? Our technical experts have implemented over 250 cloud/data projects in the last 3 years and conducted unbiased, detailed analyses across 34 business and technical dimensions, ranking each cloud data warehouse.

See How Snowflake Compares to Redshift Synapse Big Query & CDP | Hashmap

Are you trying to find how Snowflake compares to other cloud data platforms? We evaluated 34 technical and business…

www.hashmapinc.com

To listen in on a casual conversation about all things data engineering and the cloud, check out Hashmap’s podcast Hashmap on Tap as well on Spotify, Apple, Google, and other popular streaming apps.

Hashmap on Tap | Hashmap Podcast

Randy is a Regional Technical Expert based in OKC. He's responsible for providing technical guidance and solution…

www.hashmapinc.com

Don’t Do Analytics Engineering in Snowflake Until You Read This (Hint: dbt)

Using dbt To Create Tables Using Custom Materialization

What is dbt?

Source tables in dbt

Macros in dbt

Materialization in dbt

Defined Macros

Persistent Table Materialization

Our dbt Model

Walking Through dbt Execution Examples

full-refresh

full-refresh with Migrate Data Enabled

Backup Previous Version

Add column

Drop column

Are There Any Limitations To Be Aware Of?

Where Should You Go From Here

Need Snowflake Cloud Data Warehousing and Migration Assistance?

See How Snowflake Compares to Redshift Synapse Big Query & CDP | Hashmap

Are you trying to find how Snowflake compares to other cloud data platforms? We evaluated 34 technical and business…

Hashmap on Tap | Hashmap Podcast

Randy is a Regional Technical Expert based in OKC. He's responsible for providing technical guidance and solution…

Other Tools and Content You Might Like

Snowflake Data Profiler | Statistical DQ Analysis for Snowflake | Hashmap

What if you could proactively discover data quality issues and get the answers to questions such as: "What records are…

Snowflake Inspector | Interactive Visual Exploration of Snowflake | Hashmap

Snowflake Inspector Interactive, Visual Exploration of Your Snowflake Environment You've Never Seen Snowflake Like…

Hashmap Megabytes | Bite-Size Video Series

Hashmap Megabytes is a weekly video series in which mega cloud ideas are explained in bite-size portions.

“All Clear” for IoT-Based Railway Analytics

Capturing and Monitoring IoT Data is Critical to the Positive Train Control (PTC) Mandate

Quick Tips for Using Snowflake with AWS Lambda

A Working Example Using AWS Lambda Serverless Compute and Snowflake’s Cloud Data Warehouse Together

Written by Hashmap, an NTT DATA Company