Core Concept Deep Dive: Manage Data and Code as Assets

The core building block of why the Data Cloud Architecture is so important to Snowflake customers: the definition and development of Assets.

James Anderson
Data Cloud Architecture
6 min readMar 15, 2024

--

How can you validate the impact of the KPIs? By defining an ROI for your assets!

This post will kick off a 4 part series of deep dives into each of the Data Cloud Architecture core concepts.

Throughout the years, there have been many different ways that organizations have defined the projects they have worked on around data: data model, data set, dashboard, data mart, KPI, etc. And many times the projects have brought a successful result to the business, allowing them better visibility into how they are doing and allowing them to make better decisions moving forward. But so often, these projects never get off the ground because it’s impossible to articulate the value that the end result will bring to the organization, and the goals of what is being accomplished. So it becomes about who in the organization can tell a better story to the resources who are in charge of building and deploying the projects, regardless of whether or not the project will have the biggest impact on the business as a whole, leading to frustration amongst the areas of the business not telling a good story, disconnection with the development teams, and shadow IT projects spinning up to address their own needs.

The Data Cloud Architecture (DCA) Framework was created to help drive better collaboration across an organization, and make the barrier to entry to participate in the free exchange of data and applications very low. For organizations that want to be truly data driven, this free exchange is critical to up-leveling everyone in the organization, and making sure everyone is operating against the same core truths. And the mindset of a data driven organization is that data is not a tool for decision making, but rather an ASSET to the business, with a measurable ROI for every project that is undertaken. So when the Data Cloud Architecture was defined, we made sure to define these projects as Assets, with a clearly defined ROI calculation that applies across the constellation. This puts every project on the same level, and maximizes the value of the development efforts put in to build and deploy each asset. But, what is an asset, and how is it built? In the Data Cloud Architecture, assets can be lumped under 2 categories: physical and logical. Let’s take a closer look at each category.

Physical Assets

The DCA defines a Physical Asset as “digital representations that in its simplest form will be files on a file system or object store”: in other words, it’s a physical data set. For years, companies have built elaborate and hard to maintain pipelines to move data from one system to another, or move data out to their customers/partners. The value of these data sets to the consuming system might initially be very high, but since historically this is a static data set transfer, the value of that data decreases significantly over time. So even more elaborate pipelines are created to constantly push changes and updates out to all the systems who need this data in order to maintain its freshness and validate its value to the consuming systems. The complexity and person-hours it takes to build, maintain, and secure these pipelines can dramatically reduce the ROI associated with the Physical Asset being shared. But using the power of the public cloud, and the Data Cloud Architecture, we can architect much better ways to make these assets available to the relevant consumers, maximizing the ROI on the asset being developed.

One of the core concepts of the Data Cloud Architectural Framework is that the architecture is agnostic and interoperable. This means that no matter how a Business Entity is creating assets, the way of distributing those assets to the rest of your Data Cloud needs to be done in an agnostic and interoperable way, with an emphasis on interoperability. For our customers who are looking to adopt the Data Cloud Architecture, that usually means placing the asset somewhere for other Business Entities to come and get it when they need it, and bring the tools they need to consume that asset and integrate it with their own solutions. Some examples of that include a Snowflake Data Share, dropping a parquet file in a cloud storage bucket with an Iceberg table format on top for better querying, or making their system available for direct query via an API layer. All of these options allow the owning Business Entity continued oversight on the governance of the asset, and the consuming Business Entities easier integration ability to the assets they need. All of which can dramatically increase the ROI of the physical asset.

Logical Assets

When thinking about a Logical Asset, the code is the most important component of the asset. A Logical Asset should be defined in a very similar way as Snowflake defines a Native Application: a piece of code that has the ability to read and write data from your system and make changes to the underlying DDL of the database. The universe of Logical Assets is much greater than that of Physical Assets, as is the potential ROI for those assets. From the DDL for a standard data model (e.g. OMOP Common Data Model, SFDC standard object model) to a containerized visual application (anything that can run on a Kubernetes cluster) and everything in between should be considered when identifying and planning around the development and deployment of Logical Assets. And these assets can become highly collaborative across a constellation, as long as it is easy to deploy against the consuming Business Entity’s data, thus maximizing the ROI of the asset. But how can the DCA help optimize the deployment of these assets? By making sure all the code is packaged up and executed against a simple set of infrastructure requirements.

The CI/CD pipeline for applications and AI/ML Models can really be boiled down to two main factors: what are the infrastructure requirements for an application or model, and how will the application or model be accessed by the end user. The same approach needs to be taken for Logical Assets in the Data Cloud Architectural Framework. As we get an understanding of the assets being developed by the constellation, we would want to architect a deployment process that takes advantage of the resources of the public cloud (containerization, on-demand compute resources, etc) but also keeping the consuming Business Entity in mind in terms of how they access and collaborate with the Logical Asset. All while keeping in mind the overall ROI of the Asset (which include human cost of supportability as well as infrastructure cost).

Asset ROI

No matter the type of Asset that you are trying to build and deploy, whether Physical, Logical, or a combination of both, the Data Cloud Architectural Framework encourages, if not outright requires, that you have an estimated ROI metric associated with the Asset. And there are a number of reasons for that. First of all, it levels the playing field across the constellation (if each Business Entity is operating with a shared pool of resources for development). No longer would prioritization be given to whomever sells the decision maker best, but rather a common set of inputs and a single ROI metric for the org would decide development priority. In a more decentralized model, where development resources are not shared, you might not have the same competition issue with the rest of the constellation, but by adopting a shared ROI metric, your asset is much more likely to be adopted by the rest of the constellation, especially for high value assets.

When looking at the ROI calculation, some input metrics might include work hours for development, deployment, and maintenance; infrastructure costs; usage metrics; and other adoption-type metrics. It should also be noted that the ROI calculation is dynamic as well: as assets are built, the ROI of other assets may change based on the availability of resources and data built over time. And as assets are adopted, the ROI can increase dramatically. But the most important input should be tied to broader organizational or constellation-wide goals: will this asset directly help increase revenue or lower costs, and what is the estimate behind that? But this input needs to be backed up by numbers and data, not just by a compelling story. This is how a truly data driven organization looks at all development and deployment of projects, and it’s how organizations who adopt the Data Cloud Architecture are able maximize the value of their data, the most important asset of all.

--

--

James Anderson
Data Cloud Architecture

Sales Engineering Leader @ Snowflake. All opinions expressed are my own.