Snowflake Collaboration & Dynamic Tables: Managing Provider Costs When Sharing Multi-Tenant Data

How data providers can simplify consumer data sharing and save money with Snowflake Dynamic Tables.

Photo by WrongTog on Unsplash

Data providers are always looking to cut costs. Consider the following scenario: Acme Industries, a data provider, wants to share data with its consumers using the Snowflake collaboration framework. Cost is a concern, though, for the following reasons:

  1. Acme shares data with their consumers cross-cloud and cross-region, so replication and egress costs are a factor.
  2. Acme has a multi-tenant data model in which data related to multiple consumers is co-located in tables.
  3. Acme’s data model is not purely analytic. In other words, they have multiple tables containing granular data that needs to be pieced together to gain insight.

Acme’s concerns can be summarized as follows:

  • More tables increase replication and egress costs. Because they have to provision more tables to help their consumers gain insight from their data product, Acme is worried about excessive replication and egress costs eating into profits.

As for Acme’s consumers, the unnecessary complexity has two downsides:

  • Increased compute costs. Querying fragmented tables may potentially drive up compute costs.
  • Delayed time-to-insight. Putting together an analytic model from the shared tables will take longer because of the complexity.

How can Acme optimize costs and add value for their consumers without compromising on providing insights with their shared data product? Furthermore, how can Acme accelerate their consumers’ time to insight from their shared data? And lastly, how can Acme accomplish these goals without revamping their entire data model on Snowflake?

The answer is Snowflake dynamic tables.

What are dynamic tables?

Snowflake dynamic tables are a special type of table that you can use to simplify data transformation using a declarative approach. Snowflake dynamic tables are rapidly becoming a core component in our customer data engineering use cases, and their usage is growing exponentially. In this article, we will explore how providers can leverage the power of dynamic tables to provision data to their customers via the Snowflake collaboration framework.

What is the Snowflake collaboration framework?

The Snowflake collaboration framework makes it easy to share data products securely with inter- and intra-organization partners by enabling access to live, ready-to-query data across clouds and regions without the need for extract, transform, and load (ETL). Not only can a provider share data via tables, views, and services via user-defined functions (UDFs), but they can also share data applications with the Snowflake Native Apps framework.

What Providers Want

Data providers can implement any type of data model in Snowflake. These can include:

  • Operational models, where one may need to join more than one table or view to provide analytic insights
  • Multi-tenancy models, where there is co-location of more than one entity’s related data in tables
  • Simple analytic models, which may require simple joins with other tables to build prescriptive insights

When providers share data products with their consumers, they look for ways to optimize their costs for provisioning them. This may result in the following requirements:

  1. The ability to partition multi-tenant data models. For example, providers may want to provision data specific only to a customer or a region by partitioning multi-tenant data models. Additionally, data providers will likely want to keep the partitioned data in sync with the base tables on an automated schedule.
  2. The ability to build on-the-fly analytic models. Providers may want to simplify data products by creating analytic models as needed with minimal effort.

Let’s dive into how dynamic tables are able to meet these requirements.

How to use dynamic tables to optimize data sharing

Partition multi-tenant data models

Dynamic tables offer the ability to create a materialized subset of data needed to provision data for a listing without a lot of maintenance.

In the past, this would have been accomplished by creating table(s) to hold the partitioned data and having an ETL or an ELT workflow populate and keep the tables in sync with the base tables. Other approaches that we have observed in the field are to create streams on the base tables to capture incremental data and process these streams via a Snowflake task to keep the partitioned data in sync with the base tables. While these approaches worked well, they require some setup and maintenance, which are typical of an imperative approach to processing data and building pipelines.

Dynamic tables strive to eliminate this overhead. A TARGET_LAG parameter can be set on dynamic tables, which can be in minutes, hours, or days. This parameter controls how often the refresh process polls the base tables to determine any changes in them. The refresh can either be full or incremental. A WAREHOUSE parameter assigns a user-defined warehouse to provide compute for the refresh process. Once the appropriate values are set for these parameters, the refresh process keeps the dynamic table(s) in sync with the base tables automatically and effortlessly.

Once Acme has partitioned the data and populated the dynamic table(s), they can be included in a listing and shared with consumers. Additionally, if the listing has targeted consumers spread across different regions and clouds, Acme can leverage Cross-Cloud Auto-Fulfillment to provision the dynamic table(s) to each consumer’s Snowflake deployment.

Since the size of the partitioned data can be potentially much less than the volume of data in the base tables, the costs of provisioning data to consumers can possibly be reduced.

Build on-the-fly analytic models

Quite often I see cases where data providers have implemented data models in their Snowflake deployment that closely resemble operational or semi-operational models. Additionally, a few providers also attempt to share data from their less curated data layers with their consumers.

There are some drawbacks to this approach:

  • Consumers may have to spend some time understanding how the different shared tables relate to each other. This can slow down time to insight, even if helpful SQL samples are included as part of the listing metadata.
  • Sharing a non-analytic model can result in sharing granular-level data, which can lead to an increase in costs for the provider to provision the data product to the consumers.

Dynamic tables solve this issue by supporting joins across different base tables. Acme can leverage this capability to build dynamic table(s) that represent aggregates or a more succinct data model to share with consumers without revamping their entire data model on Snowflake. This will also optimize Acme’s costs for sharing data with their consumers.

Other capabilities offered by dynamic tables

  1. Change Data Capture (CDC): A consumer may want to understand the changes to the data that have occurred since they last consumed it. To do so, they can create streams on the shared dynamic table(s). These streams will need to be created in a separate database from the shared database on the consumer side as the shared database is read-only.
  2. Data governance & security: With dynamic tables, a data provider can use tags, dynamic column masking policies, row access policies, and even secure view (if the data in the dynamic tables are partitioned using a higher level attribute like a region that can span multiple consumers).

Summary

Dynamic tables provide some of the most sought-after capabilities in Snowflake, especially for data engineering use cases. This powerful capability can be leveraged in Snowflake collaboration use cases to reduce costs for providers while accelerating time to insight for consumers. The resulting value-add for the consumer can lead to increased usage of the shared data products, which is a win for providers and consumers alike.

--

--