Dynamic Table & SCD

Dynamic Table: A declarative approach with CDC capabilities

What is a dynamic table ?

A dynamic table allows us to specify a query and has the result materialized. It can monitor modifications to the query data that we provide and incrementally refresh the materialized results, hence it has a lot of potential for solving declarative method of data transformation process.

One key feature of this table which differentiates from features like streams/tasks is that the dynamic table eliminates the additional step of identifying and merging changes from the base table, as the entire process is automatically performed within the dynamic table. This eliminates the need of creating a separate target table and writing code to transform and update the data in that table.

How to create the dynamic table ?

Below is how the dynamic tables are created.

CREATE OR REPLACE DYNAMIC TABLE <name>
TARGET_LAG = { '<num> { seconds | minutes | hours | days }' | DOWNSTREAM }
WAREHOUSE = <warehouse_name>
AS
<query>

CDC with Dynamic table ?? Let us see them in action now !!

Use Case 1 → We have a table in the RAW layer that is going for a change incrementally i.e., operations like ‘Inserts/Updates/Deletes” are happening on that table, now using the “Dynamic Table” which is present in Data integration layer we are going capture the changes & refresh it incrementally for consumption.

STEP 1 --> CREATE A RAW TABLE.

CREATE TABLE DEMO_DB.DEMO_SCHEMA.CUSTOMER_RAW ---This is the raw table.
AS SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10.CUSTOMER LIMIT 100;



STEP 2 --> CREATE A DYNAMIC TABLE.

CREATE OR REPLACE DYNAMIC TABLE DEMO_DB.DEMO_SCHEMA.CUSTOMER_MAIN
TARGET_LAG = '10 minute'
WAREHOUSE = COMPUTE_WH
AS
SELECT * FROM DEMO_DB.DEMO_SCHEMA.CUSTOMER_RAW;



STEP 3 --> DO SOME DML CHANGES IN THE RAW TABLE.

(A): DELETE OPERATION
DELETE FROM DEMO_DB.DEMO_SCHEMA.CUSTOMER_RAW WHERE C_CUSTKEY='255010'; --> No. of rows that got deleted is 1.

(B): UPDATE OPERATION
UPDATE DEMO_DB.DEMO_SCHEMA.CUSTOMER_RAW
SET C_ADDRESS='California'
WHERE C_CUSTKEY='255001';


STEP 4 --> CHECK THE CDC DATA REFLECTED IN THE DYNAMIC TABLE.

SELECT * FROM DEMO_DB.DEMO_SCHEMA.CUSTOMER_MAIN WHERE C_CUSTKEY='255001';
SELECT * FROM DEMO_DB.DEMO_SCHEMA.CUSTOMER_MAIN WHERE C_CUSTKEY='255010';

--Look at the snaphots given in the following sections.
Dynamic table snaps post Update & Deletes
Post Update Execution !!

If we see over here all the changes which was made in the RAW table has been captured efficiently by the dynamic table be it updates OR inserts. For any “update” kind of transaction as you can see from “statistics” attribute in the backend it is always a delete & an insert operation. An important thing to note over here the changes are always snap based and it doesn’t track the history. Like in the main table if we want to query and try getting the output what was the older value of the updated record we would not get it.

Use Case 2→ This is about illustrating a method where we can manage SCD-Type2 kind of transaction. SCD type 2 primarily deals with row versioning. This meaning the previous versions of the records has to stay while the newer snap remains active. Over here I have explained how the transaction of type “UPDATE” would work. Similar approach can be applied with DELETEs & INSERTs as well.


/*
The base table: DEMO_DB.DEMO_SCHEMA.CUSTOMER_STG
The dynamic table: DEMO_DB.DEMO_SCHEMA.CUSTOMER_MAIN
*/

CREATE OR REPLACE TABLE DEMO_DB.DEMO_SCHEMA.CUSTOMER_STG (
CUSTOMER_ID INT,
FIRST_NAME VARCHAR(50),
LAST_NAME VARCHAR(50),
EMAIL VARCHAR(100),
PHONE_NUMBER VARCHAR(15),
FULL_ADDRESS VARCHAR(365),
LOAD_DATE DATE
);
INSERT INTO DEMO_DB.DEMO_SCHEMA.CUSTOMER_STG
VALUES
(1, 'Michael', 'Doe', 'mike@example.com', '1234567890', '123 Main St, New York, NY, 10001', '2024-01-01'),
(2, 'Rahul', 'Jain', 'rk@example.com', '0987654321', 'Bangalore, India', '2024-01-01'),
(3, 'John', 'Doe', 'john.doe2@example.com', '1234567890', '789 Broadway St, New York, NY, 10002', '2024-01-01'),
(4, 'Josh', 'Brown', 'josh@example.com', '1122334455', '321 Elm St, Chicago, IL, 60601', '2024-01-01'),
(5, 'Kim', 'kate', 'kate@example.com', '0987654322', '654 Oak St, San Francisco, CA, 94102', '2024-01-01');

-- This is the main logic & a point of view on how we can manage the SCD-Type2 data sets.
CREATE OR REPLACE DYNAMIC TABLE DEMO_DB.DEMO_SCHEMA.CUSTOMER_MAIN
TARGET_LAG='1 MINUTE'
WAREHOUSE=COMPUTE_WH
AS
SELECT
CUSTOMER_ID,
FIRST_NAME,
LAST_NAME,
EMAIL,
PHONE_NUMBER,
FULL_ADDRESS,
LOAD_DATE,
ROW_NUMBER() OVER (PARTITION BY CUSTOMER_ID ORDER BY LOAD_DATE DESC) rnm,
CASE WHEN rnm=1 THEN 'Y' ELSE 'N' END ACTION_CD,
CASE WHEN rnm=1 THEN NULL ELSE CURRENT_DATE() END LOAD_END_DATE
FROM DEMO_DB.DEMO_SCHEMA.CUSTOMER_STG;

ALTER DYNAMIC TABLE DEMO_DB.DEMO_SCHEMA.CUSTOMER_MAIN REFRESH;
SELECT * EXCLUDE (FULL_ADDRESS) FROM DEMO_DB.DEMO_SCHEMA.CUSTOMER_MAIN;
SELECT * EXCLUDE (FULL_ADDRESS) FROM DEMO_DB.DEMO_SCHEMA.CUSTOMER_MAIN WHERE CUSTOMER_ID=2;
The first run of the dynamic table
This is prior to the update of the same record in main table “DEMO_DB.DEMO_SCHEMA.CUSTOMER_MAIN”

Below are the scripts that are used for data preparation for the same record that has been highlighted as it would go for an update.

--- Use case to demonstrate the kind of transaction with UPDATE --
SELECT * FROM DEMO_DB.DEMO_SCHEMA.CUSTOMER_STG WHERE CUSTOMER_ID=2;

-- The base table
INSERT INTO DEMO_DB.DEMO_SCHEMA.CUSTOMER_STG
VALUES
(2, 'Rahul', 'Jain', 'newrk@example.com', '0987654321', 'Bangalore, India', '2024-01-02'); --> This is the new insert in staging table.
SELECT * FROM DEMO_DB.DEMO_SCHEMA.CUSTOMER_STG WHERE CUSTOMER_ID=2;
ALTER DYNAMIC TABLE DEMO_DB.DEMO_SCHEMA.CUSTOMER_MAIN REFRESH;
Data preparation for SCD-Type2 in Staging area

→ Now we run the dynamic table refresh. And below is how the data would like in the final table.

See the post run stats.
Final snap in the table.

As we can see over here post the changes that has come in the base layer i.e., DEMO_DB.DEMO_SCHEMA.CUSTOMER_STG, the dynamic table i.e., DEMO_DB.DEMO_SCHEMA.CUSTOMER_MAIN was refreshed and we can pick up the latest snap of CUSTOMER_ID=’2' with ACTION_CD=’Y’, where the older snap can also be found with Action_Cd=’N’

DYNAMIC TABLE GOVERNANCE:

There is also another way to track the “dynamic table refresh” which gives a fair idea that in which run of the dynamic table refresh the records got affected. Below is how it can be done:

use schema information_schema;
SELECT
name,
state,
state_code,
state_message,
query_id,
STATISTICS, ----> This is the key column.
data_timestamp,
refresh_start_time,
refresh_end_time
FROM
TABLE (
INFORMATION_SCHEMA.DYNAMIC_TABLE_REFRESH_HISTORY (
NAME_PREFIX => 'DEMO_DB.DEMO_SCHEMA.CUSTOMER_MAIN', ERROR_ONLY => FALSE
)
)
ORDER BY
name,
data_timestamp;
The o/p from Snowsight.

Snowsight & Dynamic table together they empower visualization

Within Snowsight there is a separate console all together that shows details about the dynamic table like:

  • How many refreshes has happened between a time range.
  • Lag metrics that gives details about dynamic table schedule.
  • Refresh duration, Rows affected, Pointers to Query profile.
Snowsight & Dynamic table capability

Dynamic table management features

  • Grouped differently: Dynamic tables are always kept in different container within Snowsight console which makes it easier to find & differentiate between usual tables.
Grouping of dynamic table.
  • Listing of the dynamic tables: Below commands are helpful when it comes to see the tables.
SHOW DYNAMIC TABLES;
SHOW DYNAMIC TABLES LIKE '%CUST%';
SHOW DYNAMIC TABLES LIKE '%CUST%' IN SCHEMA demo_db.demo_schema;
  • Details about fields: Below commands are helpful when it comes to see details about the columns & table properties.
DESCRIBE DYNAMIC TABLE demo_db.demo_schema.customer_main;
DESC DYNAMIC TABLE demo_db.demo_schema.customer_main;
  • Dynamic table refresh management: Below commands are helpful when it comes to managing the dynamic table refresh:
SUSPEND action:
ALTER DYNAMIC TABLE demo_db.demo_schema.customer_main SUSPEND;

RESUME action:
ALTER DYNAMIC TABLE demo_db.demo_schema.customer_main RESUME;

REFRESH manually action:
ALTER DYNAMIC TABLE demo_db.demo_schema.customer_main REFRESH;
  • Tracking the dynamic table runs: It can be done by using the table function “DYNAMIC_TABLE_REFRESH_HISTORY”.
use schema information_schema;
SELECT
name,
state,
state_code,
state_message,
query_id,
STATISTICS, ----> This is the key column.
data_timestamp,
refresh_start_time,
refresh_end_time
FROM
TABLE (
INFORMATION_SCHEMA.DYNAMIC_TABLE_REFRESH_HISTORY (
NAME_PREFIX => 'DEMO_DB.DEMO_SCHEMA.CUSTOMER_MAIN', ERROR_ONLY => FALSE
)
)
ORDER BY
name,
data_timestamp;

SUMMARY:

Dynamic tables possess immense capabilities around solving a lot of use cases where there is a need to improve operational efficiency, faster availability, managing CDC, etc. This blog was intended to give a jump start along with some use case demonstrations on how to use dynamic tables.

Please keep reading my blogs it is only going to encourage me in posting more such content. You can find me on LinkedIn by clicking here and on Medium here. Happy Learning :)

Awarded as “Data Superhero by Snowflake for year 2023”, click here for more details.

Disclaimer: The views expressed here are mine alone and do not necessarily reflect the view of my current, former, or future employers.

--

--

Somen Swain

Snowflake Data Superhero 2024 & 2023 | 4XSnowpro Certified | AWS Solution Architect Associate | Cloud Computing| Principal-Data Engineering at LTIMindtree