Snowflake Data Collaboration and Auto-fulfillment

In recent years, several industries have undergone significant digital transformation. From retail to healthcare to finance, we have seen how service personalization translates into a better customer experience. Organizations of all sizes have been able to innovate by better understanding their users, cross-referencing preferences, user behavior, demographic data, and more.

At the heart of these innovations are data. Companies are increasingly able to obtain vast amounts of information and unlock new use cases and revenue streams, leveraging not only internal data but also third-party data. However, acquiring third-party data products, such as feeds or data services, has always been a long and very manual process.

In the past, to collaborate on data, it was necessary to move them from one environment to another. Files were taken from FTP servers, API scripts were executed, ETL tools were used, or different data marts were configured to ensure team access to data. These data pipelines were expensive, and time-consuming to build and maintain. They were not secure because once the data was moved, it was out of the company’s control. Furthermore, data was sometimes accessed late for design reasons.

Snowflake fits into this context as a valid solution thanks to some elements that characterize its data management ecosystem.

Snowgrid

The Snowflake Data Cloud is composed of 30 public cloud regions (as of March 2022) connected by the exclusive Snowgrid technology, which eliminates the complexity of data sharing and accelerates collaboration between teams and the ecosystem. Snowgrid enables global features such as data sharing, replication, and failover.

In particular, Snowgrid allows you to:

  • share data in real-time, cross-cloud, and cross-region, without any ETL. With on-demand fulfillment, data is available across all clouds and regions.
  • share not only data but also services and applications to ensure that the ecosystem has the data and tools necessary to collaborate efficiently.
  • provide strong data governance: it is possible to control who can discover the data thanks to governance systems, which allow you to decide whether to share them with another account, with a group of accounts, or with any company present in the Data Cloud through the Snowflake Marketplace.

In addition, thanks to global connectivity, it is possible to guarantee operational continuity on multiple levels, between regions, and between clouds, so that the company can operate without interruptions and offer cutting-edge customer experiences. Furthermore, it is possible to keep up with the evolution of regulations, thanks to regional local controls and the possibility of moving completely between clouds according to different needs.

In summary, thanks to Snowgrid, Snowflake eliminates data silos, also eliminating the need for ETL pipelines, file transfers, or security protocol negotiations between cloud providers.

Marketplace

In addition to the Snowgrid layer, Snowflake is characterized by the presence of a platform within the Data Cloud that allows customers to discover, evaluate and purchase external data, data services, and applications.

This platform is the Snowflake Marketplace: essentially it is an online store that facilitates the purchase and sale of data.

Consumers can easily access available data products and applications from the Marketplace and immediately access samples through their Snowflake account. Providers, on the other hand, can choose whether to list their samples and products for free or for a fee.

Traditional data vendors, SaaS companies, and regular operating companies are already leveraging the Snowflake Marketplace to create new revenue streams, bring new products to market faster, and provide better customer experiences by helping to reduce data integration costs.

Understanding Customer Needs with Snowflake Provider Studio Analytics

As in any B2B or B2C market, providers always seek to better understand their prospects and customers and want answers to questions about their product listings, such as:

  • Who is interested in using my products?
  • How many leads are engaging with the various listings my company has made available?
  • Are my paying customers still finding value in my products?

In the Data tab in Snowflake, you can check Provider Studio Analytics, where providers can access the information they need to accurately and consistently answer these types of questions.

Inside Provider Studio, you can click on the Analytics panel, where you can observe how customers use the Listings that we have made available. Additionally, you can analyze specific metrics (Detailed Metrics panel) on the number of views, requests, and installations (also called “mounts”), as well as how many queries have been executed on a particular Listing.

Once inside the Analytics panel, you can apply filters that allow you to see how many queries have been executed in the last 28 days. You can then filter by region, consumer, day, and Listing.

Providers can also obtain aggregated information on the reach and engagement of their products, such as the most viewed or used Listings, the most active consumers, Listing conversions, and more.

In addition to the analytics available in Provider Studio, providers can run SQL queries on the DATA_SHARING_USAGE schema, which includes views that display information on public Listings published in the Snowflake Marketplace or on their Listings that have been privately shared with specific clients or business partners. Here, you can view metrics such as the number of clicks and consumption data, including queries executed by consumers.

Furthermore, the LISTING_ACCESS_HISTORY view provides granular and aggregated views on which users have accessed and to which objects (tables, views, functions, stored procedures) over time.

Cross-Cloud Auto-fulfillment

If you want to automatically replicate a Data Product associated with the Listing in other Snowflake regions, you can configure Cross-Cloud Auto-fulfillment.

Essentially, when Auto-fulfillment is enabled for a Listing, Snowflake automatically replicates the data product in consumer regions. The data product includes tables, schemas, UDFs, UDTFs, views, etc. that are part of the Listing.

Additionally, by using auto-fulfillment, you can avoid manually replicating your own data products and approving requests for your Listings, helping consumers access them more quickly.

How does Cross-Cloud Auto-fulfillment work?

First, when you publish a private Listing or when a consumer requests a data product, Snowflake checks whether the product exists in the consumer’s region. If the product already exists in the consumer’s region, the fulfillment of the Listing continues.

If your product does not yet exist in the consumer’s region, the following occurs:

  • The Provider provides consumers (X and Y) access to its Listings and sets the replication frequency.
  • The first Consumer (X) requests access to the Listing. Snowflake automatically creates a secure shared area in the consumer’s region and initiates data and Share replication.
  • The second Consumer (Y) requests access to the Listing and gains direct access without further replications, as do all future consumers in the region.
  • Snowflake performs change-based synchronization to keep the data in the secure shared area synchronized.

How to set up Auto-fulfillment

Efficient Listing management allows you to monitor the regions where consumers use the Listing, control the cost of replication, and change the Listing update frequency.

To manage or monitor Listing auto-fulfillment, follow these steps:

1. Choose Consumer Accounts

2. Enable Auto-Fulfillment

3. Choose the Update Frequency

It is important to consider replication cost factors

Data Storage

We know replicated databases in secure sharing areas in other regions incur storage costs. Currently, the cost is around $20-$23/TB per month in US-based regions. It varies depending on the cloud and region.

Compute

Replication operations use compute resources to copy data and manage data state in secure sharing areas in other regions. The observed compute usage is equal to 3–5 credits/TB of replicated data, at the remote regions' credit price, less any contract discount.

Data Transfer

The initial database replication and subsequent synchronization operations transfer data between regions. Cloud providers charge for data transferred from one region to another within their own network or from one region of another cloud. Same cloud ~$20/TB, Cross-cloud ~$90–120/TB.

Conclusion

In general, the Snowflake Data Cloud is a global network where thousands of organizations collaborate with data and data services built on the scale, concurrency, and performance of the Snowflake platform. With Snowflake, customers can share data without any movement or latency: cross-region and cross-cloud sharing are enabled by native replication and account provisioning that can be fully automated.

--

--