Using Snowflake Dynamic Tables in Lieu of Traditional Materialized Views

My use case is essentially seeking a traditional materialized view, although Dynamic Tables do so more efficiently — and beyond what Snowflake itself deems materialized views.

To simplify my case, let’s say I’m simply looking for the lifetime sales of all of my customers and with some basic aggregations. Sum of sales, count of orders, average order size. A full table scan and aggregation across my entire transaction table.

But Wait, Doesn’t Snowflake Already Have Materialized Views?

Mmm yes and no — here’s the doc around a traditional MV in Oracle. You can see it is the results of a query. The query can arbitrarily be everything. The view is costly in that it is essentially a static table until it is refreshed, and the refresh updates the entire table. If I have 100 million customers and 1 made a new order, the MV is now out of date, and if I refresh it — it is going to calculate the aggregate for all of my customers. Quite costly!

Here we can see the doc for Snowflake’s materialized views. However you will note it is not simply a query — as it is limited to a single table. It is a bit more of a performance nuance around usage of a table. It could contain a subset of columns and already filtered data from the table it is based on; it may include aggregates as well. This MV could be directly queried for results. Snowflake does seem to do a good job of processing changed data incrementally for these.

However it can be used even when not directly reference in a query — the projected and restricted rows, along with potentially different cluster approach — may cause the SQL optimizer to find the MV more useful to satisfy a query than the base table itself. Thus it need not be referenced to be used for performance gains. I might refer to such a table as more of a “Materialized Columnar Projection” than a traditional MV, because it is functioning as an alternate version of the single base table that may be optimal for some query patterns.

So What’s New With Dynamic Tables?

With Dynamic Tables, we now have the ability for the query to support any arbitrary SQL. Note — ideally our query is such that it is fully deterministic and can be updated incrementally. See the doc here for refresh modes; doing a lifetime aggregation we should be able to take advantage of incremental refreshes of only changed customers as their orders are brought into our source table. Note this would be different if we were trying to find the same metrics for the last 30 days, and doing a filter on something like the current_date — interval '30 days' which would require a full refresh! Sample:

CREATE OR REPLACE DYNAMIC TABLE miket_data.public.dynamic_table_lifetime_aggs 
TARGET_LAG = '1 day'
WAREHOUSE = MY_WH
AS
SELECT o_custkey
, sum(o_totalprice) as lifetime_sum
, count(1) as lifetime_count
, avg(o_totalprice) as lifetime_avg
, max(o_orderdate) as o_lastorder
from miket_data.public.orders_1
group by o_custkey
;

Note because Snowflake adds change tracking to underlying tables/views, I was unable to reference the shared dataset directly to add this due to limited privileges against Snowflake’s snowflake_sample_data.tpch_sf1.orders. I made a view in my own schema on it instead, which could be watched for changes. In my case this is an aggregation done in bulk load after a day’s transaction loading is complete, so there is no need to refresh it more than daily.

Given Snowflake is simply letting me set a TARGET_LAG here, likely to result in a sliding < 24h schedule at some point of the day outside of my control — I’ll always just run my own manual refresh of the table anyway. After the source has the prior day’s data inserted via ETL/ELT, I’ll simply call the refresh command.

alter dynamic table miket_data.public.dynamic_table_lifetime_aggs refresh;

This will go ahead and incrementally update only those customers who have experienced a change in the data via new orders coming in for them. It doesn’t really matter when Snowflake also decides to run its 1-day target job — it will find there’s nothing to do as no new data has arrived and won’t be arriving until the next day’s ETL/ELT job from the operational system of record.

Best of Both Worlds!

So what do we end up with in the end? We have an object that has all the advantages of a traditional materialized view, plus the ability to incrementally update only changed records so that it is constructed in the most efficient way. Awesome!

--

--

Mike Taveirne
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Consultant & Solution Architect. Enjoy DW/BI platforms Netezza and Snowflake, AutoML platform DataRobot, and currently at Feature Store category creator Tecton.