Snowflake and the Data Cloud Architecture

How Snowflake is positioned to support and execute on a Data Cloud Architectural Vision

James Anderson
Data Cloud Architecture
13 min readFeb 9, 2024

--

Over the last 10 years, as I have helped clients and customers leverage Snowflake to power their data and analytics workloads, one of the main questions I’ve gotten is “What type of architecture works best with Snowflake?”. The answer has changed as the platform has evolved, from “traditional Data Warehouse”, to “cloud-based Data Lake”, to “Data Mesh”. The goal of all of these architectures is to break down silos; making the data more accessible to a broader audience (data democratization), and driving better decision making for the business. However, while the data industry and the Snowflake platform have shifted dramatically over the last 10 years to support these different architectural frameworks, most organizations don’t move at the same speed and struggle to think beyond their own internal organizational needs. Oftentimes, they end up with everything, unable to decide or commit to one type of architecture over another. Whether that’s on Snowflake, or on another platform, they ultimately sacrifice either flexibility or the ability to break down their own internal data silos. They are mandated by their CTO to go to the cloud and take advantage of the cloud, but for most part they bring the on-prem to the cloud, with the same habits. As this technical debt continues to accumulate, it becomes increasingly more difficult to standardize across the organization. All of which leads to more silos (cloud capabilities allow you to build them more rapidly), less data accessibility, a rise in shadow IT, and the bare minimum of value being brought by the data to the business.

What can an organization do to stop this continued swirl of indecision and inaction? A new architectural paradigm called the “Data Cloud Architecture” was published recently, which strives to drive better collaboration among your ecosystem, as well as increased flexibility to accommodate how your organization wants to structure their own data architectures. By adopting the Data Cloud Architectural framework, an organization can maintain their current ways of working, but adopt new strategies for collaboration on value-driving data assets, all while maintaining the trust built through governance and security policies that are already in place. The Data Cloud Architecture (DCA) relies on four core concepts to accomplish this: Manage Code and Data as Assets, Business Entities Collaborating, Governing Constellations, and being Agnostic and Interoperable.

The DCA can be deployed on many different technical platforms, but we believe that Snowflake is uniquely positioned to enable the DCA for your organization, rapidly driving new levels of collaboration on your data assets, while maintaining the utmost flexibility for your individual business entities to build and develop their assets. The Snowflake platform is cloud native, and designed to enable the DCA at scale, with the following characteristics in mind:

  • Global — Snowflake has created a globally distributed “Snowgrid” to support customers all over the world.
  • Cloud Agnostic — Snowflake runs on all three major clouds (AWS, Azure, GCP)
  • Single Platform — No matter where your Snowflake deployment is, the platform runs exactly the same, allowing for easier distribution of assets
  • Easy to use/manage — SaaS model, with Snowflake managing all storage and compute, allowing our customers to focus on the data and assets they are building
  • Single Governance Model — As the platform works the same no matter where it’s deployed, customers can deploy a single governance model to all Snowflake accounts
  • Infinite Scale — Snowflake takes advantage of public cloud infrastructure storage and compute providing on-demand scalability
  • Resiliency — Due to Snowflake’s usage of the Public Cloud providers, we’ve architected the platform to take advantage of multiple Availability Zones across each region, maximizing resiliency and can be architected for automatic failover and recovery across regions and clouds
  • Democratization — As an industry leader in the data and application sharing space, Snowflake has been built to make digital assets available across account and geographic boundaries

But how does Snowflake apply to the four core concepts of the Data Cloud Architecture framework? Let’s take a deeper dive:

Manage Code and Data as Assets

I’ve had the opportunity to work with many different types of customers and clients, all of whom were at different stages in their overall maturity as a data driven organization. For those clients who are further along in their journey than others, one core differentiator is that they view their data not as a tool that enhances decision making, but rather as an asset, with a measurable ROI associated with how the data is being leveraged. The DCA defines an asset as “any digital representation, logical or physical, which has enough measurable value to a Business Entity that it requires management and enables collaboration”. But what does that mean, and how can Snowflake bring a unique set of capabilities that would allow for better management of an organization’s assets?

Let’s start with how Snowflake might define an asset. As the leader in Data Collaboration over the last 5 years, Snowflake has already cornered the market on the distribution of physical assets (Data Sets, Files, etc) through the Snowflake Data Marketplace. Data providers like IQVIA, Nielsen, and FactSet have driven millions of dollars of revenue for their companies through the distribution of data through Snowflake’s Marketplace. But that has limited the marketplace to only support companies who build and deploy physical assets, leaving anyone who has logical assets (ML Models, Data Models, Data Applications. etc) to find other distribution channels. But, with the launch of the Snowflake Native App framework, customers can easily build and deploy logical assets, driving new revenue opportunities and better collaboration on ML Models, Web Apps, and other logical frameworks. All of this is possible on the Snowflake platform which requires zero maintenance, allowing developers to focus on managing the code and data required to build their assets.

Speaking of the developers, Snowflake has launched many new features that support the everyday developer, making it much easier for them to build and deploy these assets. When we first launched the Native App Framework, many developers came back and said, this is great, but I really need a better CLI, integration with my chosen git repository, better logging and monitoring, and the ability to run other languages besides SQL, Scala, Java, and Python. Over the last 12 months, we’ve launched all of these features, with Snowpark Container Services being the most critical to the development of better and more flexible apps. And with Snowflake Cortex, there is even more native functionality in Snowflake to build upon, allowing the delivery of more valuable assets to their consumers.

Defining a Business Entity in Snowflake

One of the core concepts that the Data Cloud Architecture defines is that of a “Business Entity”. The DCA recommends creating a Business Entity at “a level where the assets being developed and/or managed would bring value to a separate entity”. Rather than a Supply Chain line of business and a Manufacturing line of business being defined as two separate Business Entities, it might make more sense to define “Operations” as a Business Entity and the assets being collaborated on inside of that Business Entity can be developed separately than the assets being collaborated on with the “Commercial” Business Entity. No matter how an organization wants to define their Business Entities, Snowflake can support a number of deployment models.

One of the most important aspects of the Business Entity concept is control of their own infrastructure, so that each Business Entity has the ability to build and develop their assets in whatever way they require, leveraging whatever Data Architectural framework they need to support their assets. With Snowflake, we can support multiple types of deployment models based on the needs of the business as a whole. Some Snowflake customers prefer to leverage a single Account deployment model, where all data and processing is run on a single Snowflake account. From a DevOps perspective, it makes sense because you can leverage features like zero-copy cloning in order to do new development on a production database without having to maintain multiple copies of the data. And thanks to Snowflake robust Role-Based Access Control framework, a Business Entity can be defined inside of a single account, with the ability to build and deploy their own databases and warehouses completely independent from other users of the account, and then make collaboration assets available to other roles when they need to once their deployed.

Separately, there are many Snowflake customers who prefer to deploy a multi-account architecture. For some, they want to completely separate their business units from each other, for others they may want to have different accounts on different clouds in order to take advantage of cloud-native tooling. This deployment model would fit perfectly into the Data Cloud Architectural framework, giving complete account control to each Business Entity, allowing them to choose not only how to leverage Snowflake storage and compute for their asset development, but giving them the freedom to choose whatever cloud they wish to be on. This opens up complete control of the infrastructure required to build their data assets, all while being able to leverage Snowflake features like Native Apps, Private Listings, and Listing Auto-Fulfillment to deploy and collaborate on their assets with other Business Entities.

Regardless of how Snowflake customers want to run their overall deployment, they can still leverage the Data Cloud Architecture without having to change how they run their platform in order to support each Business Entity, and allow them to build and deploy their high value assets for collaboration. That said, what’s the most important aspect of how Business Entities collaborate is the establishment of constellations and the building of trust relationships.

Driving Collaboration through Constellations

When looking at constellations, it’s important to consider a number of different factors. First, how will the different business entities collaborate with each other, and second, how are you defining the trust relationships across the constellation. And each of these considerations will be different depending on whether your constellation is comprised of business entities within a single organization, or comprised of business entities across multiple organizations. Let’s double click into these factors.

Trust Relationship

Before we dive into the internal vs external conversation, let’s define what we mean by “trust relationship”. The Data Cloud Architecture defines a trust relationship as “a formal agreement between business entities inside of a single constellation or between two constellations that defines the scope through which assets are accessed and collaborated on”. While this is a somewhat broad definition, it’s important to understand that the Data Cloud Architectural framework is built on the idea of fitting into and not conflicting with how an organization runs their business today. Every industry has different regulatory requirements for how data is accessed. With Snowflake, a constellation can build their trust relationships to cover every method required to safely and securely access the assets being collaborated on.

When a constellation defines their guiding trust relationship, there are a lot of standards that need to be looked at, and the various security and governance features of Snowflake can help support them. While many of these features have been part of the Snowflake platform since its inception, we’ve recently released our end-to-end governance framework: Snowflake Horizon. Some features that can be leveraged to create the trust relationships include:

Standards like column level security, row level security, and data masking can be supported with our various RBAC features like secured views, RLS Policies, and Dynamic Data Masking. Others like cell level encryption, differential privacy, and even more secured functions like data clean rooms are supported by various governance functions and native apps that Snowflake has built and deployed. Distributing your applications through our Native App Framework allows an organization to feel comfortable that their IP is not at risk of being compromised by the collaborating business entity, since all the code and data that goes into the app is protected by Snowflake. These features can and support contractual standards (DPA, etc), regulatory standards (GDPR, CCPA, etc), and any other required standards that your business might need to follow. All of which are inputs to ultimately define the trust relationship of the constellation. Now, let’s look at how these trust relationships can be put in action across different types of constellations.

Internal

We expect many of our customers who adopt the Data Cloud Architectural Framework will be building constellations that focus on building trust relationships for their own business entities. In order to optimize their collaboration on internal assets, many globally distributed companies could maximize their efficiencies while maintaining a globally distributed trust relationship. By deploying this architecture on Snowflake and leveraging our Organizations capabilities, customers who have implemented the DCA can drive better collaboration through tools like Private Listing, Native Apps, and Listing Auto Fulfillment to make their assets available to other accounts in their Org, regardless of the cloud choices that each Business Entity might make. And the central governance org can leverage all the tools in the Snowflake Horizon suite to manage the trust relationship of their constellation.

External

For many of our data and application providers that leverage the Snowflake Marketplace are already relying on the Marketplace to own and manage their trust relationships with their customers. These companies are driving new revenue streams and profit centers by building External constellations, with Snowflake owning and managing the trust relationships between the Business Entities. By leveraging Snowflake to manage their external constellation, they can focus on building and deploying revenue-generating assets and feel confident that Snowflake will maintain the trust relationship they’ve entered into with their customer business entities. Not only can Snowflake handle replicating assets to any cloud or any region that Snowflake supports, but we handle the security of the intellectual property by deploying Native Apps in a way that keeps the code blind to the consuming business entity. And for consuming business entities that are not on Snowflake today, we provide the ability to deploy Reader Accounts for their customers, allowing the consumers to still access the assets that bring enormous value to their business.

Ultimately, by adopting the Data Cloud Architecture on Snowflake, organizations can optimize their collaboration on data assets, and focus on what assets ultimately can bring the most value across the organization, all while feeling confident in the trust relationship that has been developed for their constellation. But how much does a constellation need to adapt themselves to fit into the Data Cloud Architecture on Snowflake? Turns out, not a whole lot.

Agnostic and Interoperable

The biggest question I had when I read the original Data Cloud Architecture paper was: do I have to change how I run my business in order to fit into the DCA? The short answer is no. One of the most compelling reasons why the DCA should appeal to Snowflake customers is that this framework fits into how you operate your business, rather than forcing you to change how you run your business to adapt to any other framework. By building a framework that is completely agnostic and interoperable, organizations can quickly get up and running with this architecture on Snowflake and start collaborating and building value-driving assets. But why is Snowflake unique in this setup? Because of how the platform was designed.

At its core, Snowflake was architected using object based storage and virtualized compute, creating an extremely fast query engine for organizations to scale with. While Snowflake was optimized for the columnar data workloads, we have never required any specific data model to operate efficiently and allow you to unlock the value of your data, and we never will. This allows for better processing, faster query performance, and support for all data types, unstructured, semi-structured, and structured. It doesn’t matter how you want to model your data, whether it’s like a data warehouse, a data lake, or a data vault. We work in whatever way your business needs. This is what has made Snowflake such a force in the data platform space over the last 10 years. Plus, with new advancements Snowflake has made in the application and GenAI space, customers who have already built their data platform on top of Snowflake now can leverage new features like Snowpark, Container Services, and Cortex to build even more powerful and impactful applications. The platform is also functionally the same no matter what cloud or region your account is running in, so whatever assets that have been built on Snowflake will work no matter where your collaborating business entities are deployed.

Snowflake’s success has also led to a vast ecosystem of tools that work well with the Snowflake platform, allowing every business entity to make the technical decisions required to build and deploy their assets. This level of interoperability makes it much easier for constellations to adopt the DCA for their business entities, while allowing each business entity to operate independently when it comes to how they build and deploy their assets.

Summary

Overall, Snowflake has been designed from the ground up to support whatever data architectural pattern your business requirements define. Snowflake is in use as a multi-petabyte scale data lake, a high concurrency, thousands of users, Data Warehouse, multi-node Data Mesh, and a semantic layer capable Data Fabric. While Snowflake might be the closest implementation of the DCA, it is not perfect (yet). The Snowflake engineering team continues to add key features allowing for the free flow of assets across Snowgrid, Snowflake’s cross-cloud technology layer, which interconnects your business’ ecosystems across regions and clouds. As you build out your data strategy to support newer analytic capabilities such as GenAI, take into consideration that the business drives the requirements, and locking away innovative functionality to be used by few people will just create frustration. True Innovation happens with democratization of data in a secure manner and the combination of Snowflake and DCA gives you the best of both worlds.

The Data Cloud Architectural framework was designed to drive better collaboration with a customer’s ecosystem, while maintaining the ultimate amount of flexibility for how each business entity and constellation operates. Snowflake has been a pioneer in this space for many years standing up industry specific constellations that are instantly accessible should the business need arise.

Media Data Cloud

Financial Services Data Cloud

Health Care and Life Sciences Data Cloud

Manufacturing Data Cloud

Government and Education Data Cloud

Retail Data Cloud

Telecom Data Cloud

Many organizations have replaced complex pipelines and and daily, weekly, or monthly unsecured ftp transfers with DCA trusted relationship and near instant access to terabyte scale data sets and easily deployable applications. Ultimately, the only thing that needs to change in the minds of everyone who operates inside of the constellation that has been defined is that your data should not be viewed just as a tool for decision making, but rather as an asset for the organization, with a measurable ROI. And if they can make that mindset shift, then it’ll become an easy question to answer: what platform decisions do we need to make to maximize our ROI on our assets inside of the Data Cloud Architecture? The answer is Snowflake.

--

--

James Anderson
Data Cloud Architecture

Sales Engineering Leader @ Snowflake. All opinions expressed are my own.