In this article, we will focus on staging data from External table to local destination table . To those who have stumbled upon this article for the first time, This article is part of an ongoing series and I would strongly suggest you to catch up with the previous tutorials to get yourself acquainted with previous tasks.

  1. Basics for the project
  2. Implementation of DBT

That said , lets jump into this tutorial and get started !

Instructions

53) Create a pckages.yml file in your working directory . Keep in mind this file should be in the exact same location where dbt_project.yml exists . They should both be in the same folder . Otherwise it wont work

54) In your packages.yml Keep the following code. This the equivalent of import libraries .

packages:
- package: dbt-labs/dbt_external_tables
version: 0.8.0 # version number

55)Once that is done rune the following command

dbt deps

56)You should get the following output

57) This would mean that the packages were successfully installed .

Pitstop

Next , we need to bring the data from external stage table to our local table .

In order to do that we would need to create a new sql file that would be executed during run time and get all the nessescary files from external stage to local destination table . A small snippet of knowledge. You see the models folder here?

Within the models folder you should see an example folder by default Inside which you will find two sample sql files .

What you need to know is that dbt will execute ALL sql files within “models” folder.

The DBT will check for these sql files recursively and execute them one after another.

The schema file that you see in here?

Inside that, there are details as to where you would want the querries output to be stored. For example, in this case, the output querry of the first model will be saved in a table called my_first_dbt_model . (After execution) .

Want to check that out yourself? Give it a test run. Run the command “dbt run” And go to your snowflake web app . Check out your tables section, you should see that a new table is created

I hope you have a better understanding of how dbt works now . Lets shift our attention , back to the instructions .

58)We want to create another model that will allow us to fetch the data from external table into the local destination database

In order to do that , Create another folder within models folder

Right click and create new folder within models folder

I named my folder snowflake_models

59) Once you have created a new folder , create a new file within that called schema.yml (This is to provide basic configurations for the script to properly connect )

version: 2

sources:
- name: snowplow
database: AWS_S3_DB
schema: PUBLIC
loader: S3
loaded_at_field: collector_hour

tables:
- name: event_ext_tbl
description: "External table of Snowplow events stored as JSON files"
external:
location: "@AWS_STG_SNOWFLAKE" # reference an existing external stage
file_format: "( type = csv )" # fully specified here, or reference an existing file format
auto_refresh: true # requires configuring an event notification from Amazon S3 or Azure

# all Snowflake external tables natively include a `metadata$filename` pseudocolumn
# and a `value` column (JSON blob-ified version of file contents), so there is no need to specify
# them here. you may optionally specify columns to unnest or parse from the file:
columns:
- name: date
data_type: varchar(255)
description: "Application ID"
- name: total_price
data_type: varchar(255)
description: "A visit / session index"

Pitstop

Understanding schema for staging operations

This is very important for you to understand.

  • “database” is the desitnation database that we want the data to be transfered to
  • “schema” is public by default . You can refer to step 48 for further reference
  • tables>name :“event_ext_tbl” . This is the table that will be created in the Destination DB after you run more commands in the future .(This image is for demonstration purpose , you wont see EVENT_EXT_TBL in the structure now) . You will only see this table as you execute more steps down the line.
  • tables>external>location : @AWS_STG_SNOWFLAKE This is the external table that Im referring to
  • sources>tables>columns .This section consists of the columns structure that the table EVENT_EXT_TBL should have . (The following image is for demonstration puspose only , You wont see the table reflect until you execute more steps down the line)

Back to instructions

60) In schema.yml . Make sure to change the 5 attributes that I have stated above, according to the details that you have . In my case, I chose to name all columns to varchar. So under the columns section , you would have something like

columns:
- name: date
data_type: varchar(255)
description: "Application ID"
- name: total_price
data_type: varchar(255)
description: "A visit / session index"

Apply the same to as many columns as you have in your CSV file . It wont work as you intend if you dont provide the appropriate details for columns structure.

61)(Optional) Just to understand how SQL querry files work within DB . Create a file called my_querry.sql under snowflake_models

Once you have created the my_querry sql files . Type the following code in that file

{{ config(
materialized='table'
) }}
select * from event_ext_tbl

This will ensure that all the data from even_ext_tbl is copied to another local Snowflake table called MY_QUERRY( All output for SQL files are stored in tables named after their own!)

Once again , you wont see this until you execute the next step

62) Now that this is done lets run the command to execute migration of files to local Database EVENT_EXT_TBL.

dbt run-operation stage_external_sources

63)(Optional) This step is meant for data to be transferred from EVENT_EXT_TBL to MY_QUERRY

dbt run

65) Now all the changes should be reflected . Go to your snowflake web app and check under tables . And check if you able to see EVENT_EXT_TBL and MY_QUERRY tables.

Conclusion

If everything went as expected . Congratulations 🎉🎉! All the data has now been processed from external table to your local snowflake database using DBT!

Great work! In the next part of this tutorial we will focus on the AWS tools , orchestrate docker image containerization and task execution using Step functions, ECR and ECS . So stay tuned for Part 4!

As always , I hope this article was informative. If you have found it helpful, share and follow for more such content .And reach out to me if you are stuck at any point . Until then

… To infinity and beyond.

--

--