Snowflake Dynamic Tables are now Generally Available : What you need to know to get started

Arindam Sengupta
10 min readMay 1, 2024

Snowflake’s Dynamic Tables, now Generally Available as of April 29 2024, represent a significant advancement in data transformation and management within the Snowflake ecosystem. This feature simplifies the data engineering process by automating data transformation tasks, which traditionally required extensive manual coding and management.

In this article, we will explore the key features, benefits, and operational insights of Snowflake dynamic tables, providing a QuickStart guide for practitioners looking to leverage this powerful feature.

What is a Declarative Data Pipeline?

A declarative data pipeline is an approach of defining data workflows where the focus is on the desired outcome rather than the specific steps to achieve that outcome. This approach specifies what should be done without detailing how to do it. The system interprets the declarations and manages the execution, optimizing the process and handling dependencies automatically.

Declarative data pipelines are gaining popularity because they significantly simplify the creation and management of complex data workflows. They allow data engineers to define transformations at a high level, making pipelines easier to understand, maintain, and scale. As data environments become more dynamic, the ability to abstract execution details and automate dependency management is increasingly valuable.

dbt (Data Build Tool) is arguably the most popular tool today to build and manage declarative data transformation pipelines in Snowflake(and other data warehouses).

Introduction to Dynamic Tables in Snowflake

Dynamic Tables provide a declarative approach to data transformations directly within Snowflake. Dynamic tables are designed to streamline the process of data transformation within Snowflake by automating the management of data pipelines. Unlike existing methods that involve manually writing and scheduling tasks, dynamic tables allow users to define the desired end state of their data using declarative SQL, and Snowflake handles the rest.

Here’s a simple example of a Dynamic Table:

CREATE DYNAMIC TABLE sales_summary
TARGET_LAG = '5 minutes'
WAREHOUSE = etl_wh
AS
SELECT
date,
store_id,
SUM(amount) as total_sales
FROM sales_data
GROUP BY date, store_id;

Let’s break this down:

  • The CREATE DYNAMIC TABLE statement defines a new Dynamic Table named sales_summary.
  • The TARGET_LAG parameter specifies how "fresh" the data in the table should be. Here, we're saying the data should never be more than 5 minutes out of date.
  • The WAREHOUSE parameter specifies the virtual warehouse to use for table refreshes.
  • The AS clause contains a standard SQL SELECT statement that defines the contents of the Dynamic Table. Here, we're aggregating sales data by date and store.

That’s it! With this simple declaration, Snowflake will automatically keep the sales_summary table up-to-date as new data arrives in the sales_data source table. No complex scheduling or orchestration required.

Let’s walk through a more detailed example.

Suppose we have an e-commerce system with the following tables:

  • products: Product catalog data, updated daily
  • customers: Customer profile data, updated hourly
  • orders: Raw order data, streamed in real-time

Our goal is to build a real-time dashboard showing key order metrics, enriched with product and customer data. With Dynamic Tables, we can express this as follows:

-- base tables scripts for reference
CREATE OR REPLACE TABLE products (
product_id INT,
product_name VARCHAR(255),
category VARCHAR(255)
);

CREATE TABLE customers (
customer_id INT,
customer_name VARCHAR
);

CREATE OR REPLACE TABLE orders (
order_id INT,
customer_id INT,
product_id INT,
quantity INT,
price DECIMAL(10, 2),
order_timestamp TIMESTAMP
);

-- Create a dynamic table for enriched order summary
CREATE OR REPLACE DYNAMIC TABLE order_summary
TARGET_LAG = '1 minute'
WAREHOUSE = 'dashboard_wh'
AS
SELECT
o.order_id,
o.customer_id,
c.customer_name,
o.product_id,
p.product_name,
p.category,
o.quantity,
o.price,
o.quantity * o.price AS total,
o.order_timestamp
FROM orders o
JOIN products p ON o.product_id = p.product_id
JOIN customers c ON o.customer_id = c.customer_id;

-- Create a dynamic table for real-time sales aggregation
CREATE OR REPLACE DYNAMIC TABLE sales_aggregation
TARGET_LAG = '5 minutes'
WAREHOUSE = 'reporting_wh'
AS
SELECT
DATE_TRUNC('hour', order_timestamp) AS order_hour,
category,
SUM(quantity) AS total_quantity,
SUM(total) AS total_sales
FROM order_summary
GROUP BY 1, 2;

-- Create a dynamic table for top-selling products
CREATE OR REPLACE DYNAMIC TABLE top_products
TARGET_LAG = '10 minutes'
WAREHOUSE = 'analytics_wh'
AS
SELECT
product_id,
product_name,
SUM(quantity) AS total_quantity,
SUM(total) AS total_sales
FROM order_summary
GROUP BY 1, 2
ORDER BY 4 DESC
LIMIT 10;

In this example:

  1. We start by creating the base tables: orders, products, and customers. These tables store the raw data that will be streamed in real-time or updated periodically.
  2. We create a dynamic table called order_summary that joins data from the orders, products, and customers tables to produce an enriched order summary. The TARGET_LAG is set to 1 minute, ensuring the summary is never more than 60 seconds out of date. As new orders stream in and the products and customers tables are updated, the order_summary will automatically refresh to reflect the latest data.
  3. Next, we create another dynamic table called sales_aggregation that aggregates the data from order_summary to provide real-time sales metrics. It groups the data by hour and category, calculating the total quantity and total sales for each combination. The TARGET_LAG is set to 5 minutes, allowing for a slightly longer refresh interval.
  4. Finally, we create a dynamic table called top_products that identifies the top-selling products based on the data from order_summary. It aggregates the quantity and sales for each product, orders the results by total sales in descending order, and limits the output to the top 10 products. The TARGET_LAG is set to 10 minutes, providing a regularly updated view of the best-selling items.

This example demonstrates how dynamic tables can be chained together to create a DAG (Directed Acyclic Graph) for more complex data pipelines.

The order_summary table serves as the foundation, and subsequent dynamic tables build upon it to provide various aggregations and insights.

By leveraging dynamic tables, we have simplified the data transformation process and enabled near real-time analytics with some simple sql statements !

Snowflake will automatically handle the incremental updates and refreshes based on the specified TARGET_LAG, ensuring the data remains fresh and up-to-date.

We can query these dynamic tables just like any other table in Snowflake, and the results will reflect the most recent data based on the defined refresh intervals. This allows for real-time dashboards, alerts, and analytics without the need for complex manual data pipeline management.

Let’s do a quick summary of what we have learnt so far -

  • A dynamic table is created using a declarative approach. The desired result is specified using a SELECT statement that joins the base tables and calculates the total amount for each order.
  • The dynamic table is configured with a TARGET_LAG , indicating that it should be refreshed automatically by Snowflake to keep the data up to date within the specified lag time.
  • When new data is inserted into the base tables, the dynamic table incrementally updates itself during the next refresh cycle, incorporating the new data without recomputing the entire result.
  • The dynamic table performs complex transformations, including joins and aggregations, to combine data from multiple tables and calculate the total amount for each order.
  • The dynamic table seamlessly handles both batch updates and streaming updates to the base tables without any change to its definition.
  • Finally, the dynamic table can be queried to retrieve the order summary data, which reflects the latest state of the base tables.

Refer to Snowflake documentation for more details(link).

Dynamic Tables Performance

Dynamic tables are designed to provide efficient data transformation and management. The performance of dynamic tables is influenced by several factors:

  1. Refresh Mode: Incremental refresh can significantly improve performance compared to full refresh, especially for large datasets with frequent, small changes.
  2. Query Complexity: The complexity of the SQL query defining the dynamic table can impact refresh performance. Optimizing the query, such as using appropriate indexes and partitioning, can improve refresh times.
  3. Warehouse Size: The compute resources associated with the dynamic table’s warehouse directly affect refresh performance. Choosing an appropriately sized warehouse is crucial for optimal performance.

You can refer to the Snowflake documentation for more details on performance optimization of dynamic tables(link).

Compute and Storage Cost for Dynamic Tables

Understanding the cost implications of using dynamic tables is essential for effective resource management. The cost of running dynamic tables is influenced by :

  1. Compute Cost: Dynamic table refreshes consume compute resources from the associated warehouse. The cost is based on the warehouse size and the duration of the refresh operation. Optimizing refresh frequency and query efficiency can help manage compute costs.
  2. Storage Cost: Dynamic tables consume storage space like regular tables. The storage cost depends on the size of the materialized data and the retention period. Regularly reviewing and optimizing storage usage can help control costs.

Monitoring Dynamic Tables

Snowflake provides comprehensive monitoring capabilities for dynamic tables through Snowsight and SQL commands.

  1. Snowsight: The web interface offers detailed insights into dynamic table performance, refresh history, and dependencies. Practitioners can use Snowsight to identify bottlenecks, track refresh progress, and analyze resource utilization.
  2. SQL Commands: Snowflake provides various SQL commands and functions for monitoring dynamic tables, such as DESCRIBE DYNAMIC TABLE, SHOW DYNAMIC TABLES, DYNAMIC_TABLE_REFRESH_HISTORY, and DYNAMIC_TABLE_GRAPH_HISTORY. These commands enable programmatic access to dynamic table metadata and history.

Best Practices for Using Dynamic Tables

  1. Define Clear Refresh Targets: Setting appropriate TARGET_LAG values is crucial for balancing performance and cost. Practitioners should carefully consider how fresh the data needs to be to meet business requirements.
  2. Use Incremental Refresh Where Possible: Leveraging incremental refresh can significantly reduce compute resources and improve performance, especially for large datasets with frequent, small changes.
  3. Monitor and Adjust: Regularly review the performance metrics provided by Snowsight and adjust configurations as needed to optimize data processing pipelines.
  4. Security and Access Control: Manage access to dynamic tables using Snowflake’s robust security framework to ensure that only authorized users can perform operations like creating, querying, and modifying dynamic tables.

For detailed guidance, please refer to Snowflake documentation(link).

Known Limitations for Dynamic Tables

While dynamic tables offer significant benefits, there are some limitations that exists today which should be considered:

  1. Unsupported Query Constructs: Certain query constructs, such as stored procedures, non-deterministic functions (except those listed in the documentation), and external functions, are not supported in dynamic table definitions.
  2. Unsupported Data Sources: Dynamic tables cannot directly reference external tables, streams, or materialized views as data sources.
  3. DML Operations: Since dynamic table content is based on the query results, direct DML operations (INSERT, UPDATE, DELETE) on dynamic tables are not supported.
  4. Refresh Scheduling: Dynamic tables rely on the specified target lag for refresh scheduling. Fine-grained control over refresh schedules is not available.

Many of these would be updated in the upcoming Snowflake releases. You can watch this page for updates on Dynamic Tables limitations(link).

Common Pitfalls to Avoid

While dynamic tables offer a powerful and efficient way to manage data pipelines in Snowflake, there are a few common pitfalls to be aware of:

  1. Overcomplicating Queries: It can be tempting to create a single, complex dynamic table that handles multiple transformations and aggregations. However, this can lead to performance issues and make troubleshooting more difficult. Instead, consider breaking down your pipeline into smaller, more manageable dynamic tables that each handle a specific task.
  2. Neglecting Data Locality: Data locality plays a crucial role in the performance of dynamic tables, especially when using incremental refresh. Ensure that your tables are clustered based on the keys used in joins, GROUP BY, and PARTITION BY clauses. Failing to optimize data locality can result in suboptimal performance.
  3. Setting Unrealistic Target Lag: While it’s desirable to have near real-time data, setting an overly aggressive target lag can lead to increased costs and potential performance issues. Be realistic about your data freshness requirements and adjust the target lag accordingly.
  4. Overlooking Monitoring and Optimization: Dynamic tables are not a “set it and forget it” solution. Regularly monitor the performance of your dynamic tables using the tools provided by Snowflake, such as Snowsight and SQL commands. Adjust your configurations and optimize your queries based on the insights gained from monitoring.
  5. Ignoring Cost Implications: Dynamic tables consume compute resources and storage, which can impact costs. Be mindful of the refresh frequency, warehouse size, and storage requirements of your dynamic tables. Regularly review and optimize these factors to ensure cost-efficiency.
  6. Inadequate Testing: Not thoroughly testing Dynamic Tables before moving them into production can lead to unexpected behavior and errors. It’s advisable to conduct comprehensive testing, especially when Dynamic Tables depend on multiple data sources or when used in complex data transformation pipelines.

Comparison with dbt in the context of Snowflake

Both Snowflake Dynamic Tables and dbt offer powerful capabilities for creating declarative data pipelines for Snowflake.

Some key points to consider -

  • Functionality: Dynamic Tables are deeply integrated into Snowflake and provide a more seamless experience for Snowflake users looking to automate data refreshes directly within the database. dbt, while highly compatible with Snowflake, operates as a separate tool/service.
  • Ease of Use: Dynamic Tables would be easier to use for those familiar with Snowflake’s environment as they require less setup compared to dbt, which might require additional integration and configuration efforts.
  • Flexibility: As of now, dbt offers greater flexibility in terms of transformation logic, testing, and version control, making it suitable for complex data engineering tasks that require rigorous testing and documentation. But Dynamic Tables would expectedly catch up in the near future.

The choice between the two would depend on factors such as the complexity of transformations, the need for real-time updates, and the team’s familiarity with SQL and software engineering practices. In some cases, using both together can provide a robust solution, with dbt handling complex transformations and Dynamic Tables enabling real-time data access.

Dynamic Tables vs Materialized Views

Although Materialized views offers similar flexibilities like storing pre-computed results, SQL query based definition, always up to date, they are meant for different use cases and purposes. Materialized views are best suited for optimizing query performance on complex, infrequently updated datasets, while dynamic tables are designed to simplify and automate the creation and management of data transformation pipelines for both batch and streaming data. If we need to build a complex, multi-stage data transformation pipeline that involved joining multiple tables, handling real-time streaming data, and managing dependencies between transformation steps, Dynamic tables would be more suitable.

Conclusion

Snowflake Dynamic Tables would be game-changers in the Snowflake ecosystem. They offer a declarative and transformative approach to data pipeline management, significantly reducing the complexity and overhead associated with traditional ETL processes. By automating data transformations and leveraging declarative SQL, Snowflake practioners would now focus more on strategic data analysis and less on pipeline maintenance. As this feature continues to evolve, it promises to become an essential tool in the data engineer’s toolkit for building efficient, scalable, and manageable data architectures in the Snowflake Data Cloud.

--

--

Arindam Sengupta

Enterprise Data Architect 🏢 | AWS & Snowflake Expert 🛠️ | Data Governance Advocate 🛡️ | Agile Leader 🚀