❄️ Snowflake in a Nutshell — The Snowpipe Streaming API & Dynamic Tables

Welcome to the latest edition of Snowflake in a Nutshell, where we explain various Snowflake topics in straight forward terms.

Today’s topics are “The Snowpipe Streaming API & Dynamic Tables”.

These features were originally announced at the Snowflake 2022 World Summit (Dynamic Tables originally labelled as Materialized Tables), and in some ways they have remained “under the radar”. This is mainly because they are in Private Preview i.e. currently under development.

Nonetheless they are in the Snowflake release pipeline, and this blogpost provides a heads-up for what’s to come. Likewise we’ll explain their product positioning in the context of similar sounding, yet different Snowflake features.

But first things first…

…What is streaming?

Streaming (a.k.a. “event streaming”), is defined by Kafka.apache.org as:

The practice of capturing data in real-time from event sources like databases, sensors, mobile devices, cloud services, and software applications in the form of streams of events. Event streaming ensures a continuous flow and interpretation of data.

What is the Snowflake Snowpipe Streaming API?

The Snowflake Snowpipe Streaming API
The Snowflake Snowpipe Streaming API. Credit: Snowflake

We can use the Snowflake Snowpipe Streaming API to load streaming data rowsets into Snowflake database tables. The streaming data rowsets can be consumed from various types of data stream publishers.

Data streams can be published as topics. A topic is a log of events (e.g. website clicks), and provides a one-way data flow from the publisher to one or many subscribers:

  • Streaming data rowsets can be automatically processed by calling the Snowflake Streaming API.
  • This allows us to ingest & process streaming data using Snowflake’s native capabilities. API calls can be configured using the Java based Snowflake Ingest SDK.

Don’t we already have Snowpipe for that?

Yes & no.

Of course Snowflake Snowpipe is used for streaming data. However:

  • Snowpipe processes data from staged files. The below diagram illustrates Snowpipe consuming files from an external AWS S3 stage.
Snowflake Snowpipe processes data from staged files (in this case external AWS S3 stages). Credit: Snowflake
Snowflake Snowpipe processes data from staged files (in this case external AWS S3 stages). Credit: Snowflake.

On the other hand, as mentioned in the previous section, the Snowpipe Streaming API processes data from rowsets.

Additional key differences:

  • The Snowpipe Streaming API does not require a Snowflake pipe object. Instead the API is used to write directly to the target table.
  • The Snowpipe Streaming API is positioned to result in lower load latencies & costs for loading similar volumes of data compared to Snowpipe

What are Dynamic Tables?

Dynamic Tables (formerly known as Materialized Tables) are used to join and transform streaming data as it flows into and through the Snowflake Data Cloud. We can use Dynamic tables to:

  • Join & aggregate data as it streams into, and through, Snowflake with e.g. reference data which is already on the platform
  • The “LAG” parameter in the Dynamic Table definition tells Snowflake how often to incrementally refresh the Dynamic Table with the latest streamed data
  • The aggregates automatically update as new data streams in / through
  • Note the CTAS (Create Table As SELECT) syntax below
Syntax for Dynamic Table creation. Credit: Jeremiah Hansen, Principal Data Platform Architect, and Saras Nowak Senior Product Manager at Snowflake

Are Dynamic Tables related to Materialized Views?

Materialized views are different and have a different use case, namely:

  • To help boost query performance

The materialized view resultset is pre-computed, and if there are any changes to the underlying data, the view’s resultset is automatically updated. Some trade offs include query optimization vs. additional costs incurred to keep the resultset updated, and certain limitations to how a materialized view can be defined.

What about Snowflake Table Streams?

Even though the Snowpipe Streaming API and Snowflake Table Streams have the word “Stream” in their names, they are different features intended for different use cases. Specifically:

  • Snowflake Table Streams provide change tracking for database objects such as tables and views. They describe the state of a row before and after a data change.

In other words:

Table streams track Data Manipulation Language (DML) changes such as INSERTs, UPDATEs, and DELETEs, along with metadata related to those changes.

This process is also referred to as Change Data Capture (CDC).

Interestingly, as and Saras Nowak, Senior Product Manager, and Jeremiah Hansen, Principal Data Platform Architect at Snowflake illustrate here, Dynamic Tables can be used as an alternative to Snowflake Table Streams + Tasks.

Summary

As we have seen, the up and coming Snowpipe Streaming API and Dynamic Tables will make it easier to consume and query streaming data. The Snowpipe Streaming API pushes streaming data directly to Snowflake tables, and Dynamic Tables can then be used to join and aggregate the data as it streams into Snowflake.

Note: The Snowpipe Streaming API and Dynamic tables features are in Private Preview so not yet generally available, and subject to change.

It’s also important to be aware of what may be similar sounding, but yet different features such as Materialized Views (query performance enhancement) and Table Streams (Change Data Capture).

Tip: Likewise, it’s important to know when data streaming is a good fit, and when standard approaches such as batch processing might suffice. This can often boil down to business requirements & costs.

If the requirements do point to real-time data processing, then Snowpipe (for file streaming), or the Snowpipe Streaming API (for rowset streaming) + Dynamic Tables may be just the candidates!

Exciting times!

© Dan Galavan. Dan is an independent data architecture consultant working professionally with data for 23 years. He is a member of the Snowflake Data Super Hero program, a data modeling expert, and Snowflake trainer. www.galavan.com.

--

--

Dan Galavan
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Independent Data Architecture Consultant, 24 years experience. Snowflake DSH (1 of 72 worldwide). www.galavan.com. Author: Snowflake in a Nutshell series ©