The Data Cloud Architecture Defined

Kevin Bair
Data Cloud Architecture
12 min readFeb 2, 2024

Data Architectures are designed to address different problems based on business objectives, go to market models, and often evolve as a function of IT organizational maturity. Aligning business with IT has been a challenge for the last half century as companies grow globally, offering a higher number of goods and services through multiple channels. It’s no surprise that this leads to larger volumes of data being generated and puts a strain on IT resources to handle the increased demand. Leveraging the technology available at the time, data silos began to appear throughout organizations leading the C-suite to clamor for a “single source of truth” so they could run the business more efficiently and effectively.

Over the years, different data architectures have presented themselves to solve these challenges, collecting data from operational systems and making it available for analysis by different constituencies such as LOBs, partner/suppliers, and even back to customers. Data architecture patterns such as Data Warehousing, Data Lake, Data Mesh and Data Fabric are all methods of managing data with the intent to democratize data and provide access to information through the enterprise in a timely fashion. All of these have pros and cons which are well known, but each is limited in their ability to reach beyond the walls of the company and set the future for innovation with partners and customers alike.

The Data Cloud Architecture (DCA) is a top down, business centric methodology defining how both internal and external entities collaborate and share assets. DCA focuses on four core concepts seeking to build upon, and not necessarily disrupt legacy architecture patterns. Business leaders need to recognize the sizable investments made over the years by IT, while IT needs to recognize the pace of change and the huge opportunity cloud computing represents.

DCA is a composition of the following ideas:

  • Manage Code and Data as Assets — AI/ML models, LLMs, any type of code or data can be moved or referenced, bringing the work to the data, or vice versa.
  • Business Entities Collaborate — Internal LOBs, Partners/Suppliers, Customers own and collaborate on assets in near real time.
  • Governing Constellations — Trust Relationships are formed as the foundation for concepts like, access control and discoverability
  • Agnostic and Interoperable — Cloud Agnostic, Underlying Data Architecture Agnostic
Key Characteristics of Data Architectures

Several data architectures have emerged as the volume of data and systems to manage that data have proliferated. The above diagram puts these architectures in context with 2 distinct thoughts. On the left, what types of problems does the architecture solve; technical (speeds, feeds, data volumes) or business (democratizing data for analytics and customer use). On the bottom, how is data managed, all in one place, or is ownership and the systems that manage the data distributed in multiple places (regions, hyperscalers, data centers, etc.). Organizations have the ability and often mix and match architectures together to perform multiple functions. The DCA embraces Business Entity specific implementations while allowing for collaboration to occur within a Constellation.

Manage Code and Data as Assets

The Data Cloud Architecture defines an Asset as any digital representation, logical or physical, which has enough measurable value to a Business Entity that it requires management and enables collaboration. Often teams can get stuck with too many low value tasks and activities which prevents them from focusing on core business functions. With the Data Cloud Architecture framework, the focus should be on creating and managing high value assets (those that have value both inside and outside the organization), and removing the need to manage the rest of the low value tasks (infrastructure, compute, storage, etc). The DCA allows you to focus on managing what is needed to maximize the ROI of your data assets. Your organization will be able to drive new revenue streams, and drive down costs across the board through optimization of low value tasks. DCA Assets remain under the control of the owning Business Entity even when shared for update, maintenance and governance reasons. Assets will fall under two categories, logical and physical as described below.

Physical Assets — Physical assets are digital representations and in its simplest form will be files on a file system or object store. An asset can be constructed using one or more of these lowest level representations (files). An example would be metadata and associated assets to the core asset or configuration files along with code assets. This could loosely be defined as a composite asset, however for simplicity the collection should just be referred to as a DCA Asset. Physical assets are created and maintained by an owning Business Entity on the owning Business Entity’s infrastructure. Physical assets on the owning Business Entity’s infrastructure can then be copied to the consuming Business Entity or made directly accessible on infrastructure managed by the owning Business Entity.

Logical Assets — Links to physical assets on the owning Business Entity’s infrastructure can be shared by reference with the physical asset being maintained by the owning Business Entity. A DCA Asset may also be composed of both logical and physical assets, again for simplicity, this should just be referred to as a single DCA Asset. Even though the physical version of the asset may not be on the consuming Business Entity’s infrastructure (other than just the reference), it should appear, and the consuming entity should be able to act upon it as if it was local (on the consuming Business Entity’s infrastructure).

DCA Assets

In the diagram above, Business Entity 2 is the owning Business Entity, and is sharing a logical asset with Business Entity 1, represented with a dotted line. Similarly, Business Entity 5 is acting as a Marketplace, and is physically sharing a copy of an asset with Business Entity 3.

Business Entities Collaborate

Before we dive into the specifics of how the Data Cloud Architecture helps drive collaboration, let’s define the meaning of a “Business Entity”. A Business Entity will vary between organizations, just as every company is structured in different ways. In the context of the Data Cloud Architecture, we recommend that you define the Business Entity at a level that allows for ownership of and collaboration on specific data assets while maintaining a level of flexibility for all types of collaboration. The Business Entity should be defined at a level where the assets being developed and/or managed would bring value to a separate entity. Business Entities will create and manage “own” assets and Business Entities will consume “allow collaboration on” assets. As such, both the owning and consuming entities are required to have ownership over infrastructure (storage and compute) that they control. This allows for the transfer of and execution of assets on either the owning or consuming Business Entities infrastructure depending on how the DCA is implemented.

Organizations make technology decisions based on what they feel is required to be successful. While this allows for the business to operate in the most efficient way possible, it can cause issues with how different Business Entities can collaborate on their data assets. Whether you define your Business Entities as different lines of business inside of your organization, subsidiaries that operate as standalone entities under the broader umbrella, your entire partner ecosystem, or your revenue generating customers, it is critical that you are able to collaborate on all your assets in order to maximize efficiency across the board. The DCA enables a level of collaboration that is flexible across all types of Business Entities, and allows for all types of collaboration, not just sharing data in one direction (Think FTP as an example). While initially we expect most collaboration across Business Entities to start internally, the DCA lays the foundation for new partners, and even new customer interactions under the same framework, allowing for a significant improvement for making business decisions or time to market for new revenue-generating products. By deploying the DCA for your business, you can bring new assets to every part of your business, and maximize the value of those assets through stronger collaboration.

Governing Constellations

Business Entities can be grouped together to form a Constellation. A Constellation is a set of Business Entities that have agreed upon a set of standards, governance and operating procedures and have a mandate/goal of collaborating on assets. Business Entities may participate in more than one Constellation forming a one to many relationship being mindful that the agreed upon “trust” must be met at all times. A Constellation must have more that one Business Entity to be considered a Constellation

DCA Constellations

The example above shows two Constellations. Business Entity 2 is the producing entity and is part of both Constellation 1 and Constellation 2. In this case Business Entity 2 shares an asset physically with Business Entity 1 and logically with Business Entity 4. Because Business Entity 1 and Business Entity 4 are not part of the same Constellation, it is not assumed that they can share without establishing their own trust relationship.

Trust Relationship

Collaboration relies on trust. Many organizations struggle to even give access to systems and data within their own four walls. Others are mandated by regulatory concerns and are constrained not by the technology, but by process. The DCA makes no assumption as to the level of trust between Business Entities. It does however put forward considerations for how to layer restrictions and use common security practices to meet the objective. A Trust Relationship is a formal agreement between Business Entities inside of a single Constellation or between two Constellations that defines the scope through which assets are accessed and collaborated on. The scope could include open standard transfer protocols, data security policies (contractual or compliance related), and overall asset discoverability.

Business Entities — Business Entities own assets, so they have responsibility for granting and managing access control to those resources unless they knowingly and specifically transfer that responsibility to another Business Entity. As mentioned above, access to assets under the control of a Business Entity can be wide open, any asset created and designated as a DCA Asset could be made instantly accessible to other entities within the Constellation. In most cases this is not recommended, and a minimal set of restrictions should be considered to at least monitor usage and gain insight as to what assets are being used and for what business value.

Assets — There are multiple aspects of trust regarding assets such as quality, timeliness, and of course trust in the source. Other concepts like encryption, role based access control, data masking, policy management, data cleanrooms, differential privacy, etc. are all outside the scope of the DCA but should be considered based on well defined Business Entity security standards. Note: It is a best practice to categorize assets in some way and apply policies for access by individuals or groups.

Discoverability — You don’t know what you don’t know. The more assets that are generated by owning Business Entities the harder it will become for consuming Business Entities to find and request access to what they want/need. Managing metadata as part of the DCA is essential to good governance. The DCA establishes no requirements related to metadata, but highly recommends as your Constellation grows to make metadata discoverable for the widest possible use.

Agnostic and Interoperable

Agnostic is a mindset relying on architecture frameworks and technology independent principles with a focus on allowing teams to use the patterns and platforms of their choice for managing and collaborating on their assets. Many organizations confuse open standards with “open source”, but in this context Business Entities are not constrained by the underlying IT focused data architectures. Ultimately, the goal of a Constellation that leverages the DCA should be to allow for a completely independent set of platform and data architecture decisions that are made at the Business Entity level to maintain interoperability across the board. The DCA brings a business first, top down approach to collaborating on assets and a set of principles to consider when sharing those to include:

Data Architecture

The Data Cloud Architecture allows a Business Entity to leverage whatever Data Architectural framework or pattern required to build and deploy their assets. They can also be completely agnostic between each Business Entity, allowing for better collaboration. Consideration should be given as to why an architecture is selected and for what purpose. Organizations often forget the “why” and focus on the “how” leading to poor alignment with the business’s objectives.

Asset Formats — The responsibility of specifying a consumable asset format by another Business Entity is by the owning Business Entity. The owning Business Entity should specify metadata describing the characteristics of the asset necessary for collaboration. Using the metadata, the DCA allows for either the transfer of or direct access to the asset depending on the implementation (logical vs. physical). It should be noted that the metadata associated with the asset is also part of the same asset.

Consumption Considerations — It is also possible and highly encouraged that Owning Business Entities can also be consuming Business Entities. Under this paradigm, Business Entities become interoperable and part of an interconnected architecture. This “network of networks” allows for interaction and collaboration between LOBs, Partners and customers alike. The DCA does not restrict how the consuming Business Entity uses the asset and may necessitate working out methods for consuming Business Entities to find assets to consume.

Public Cloud/Internet (Networking, Storage, and Compute)

The Data Cloud Architecture can only be enabled because of the rise of the Public Cloud infrastructure, providing infinite scalability for both storage and processing, as well as the continued advancement of the internet, allowing for global connectivity across Business Entities. Beyond being network connected, each Business Entity will maintain and be responsible for Storage and Compute infrastructure necessary for execution and storing of assets.

Networking — Business Entities are required to be network accessible in order to facilitate interoperability and collaboration. The DCA places no restrictions on which Business Entities may interact with each other. Conceptually, while every Business Entity may be network connected (let’s say by the Internet), security and IT teams may feel the need to restrict or govern access internally (company wide), externally (between partners and suppliers), and with their customers

Storage — Assets will be persisted by the owning Business Entity leveraging network storage (public cloud object storage) which can be made accessible to a consuming Business Entity. Depending on the implementation (logical vs physical), that asset or set of assets may be replicated to Storage “closer to” the consuming entity. Under the DCA, storage can be owned by either Business Entity however the owning entity maintains control over the asset. As a reminder, this relationship can be “two way” at any time, owning entities consuming assets from entities they are sharing with.

Compute — Business Entities exchange assets for business value. In order to use the asset being presented, the consuming entity will leverage the processing power available to them which is why leveraging public cloud infrastructure is appealing (and a core concept). By defining interoperability standards ahead of time, a principle of the DCA is that the asset (let’s say a code) “just works”. It’s safe to say that hyperscalers offer different capabilities at different price points, and consuming Business Entities may / will take advantage to those to act upon the assets being shared.

DCA Example

In previous examples we showed that Business Entities 1 and 2 were in a Constellation and Business Entities 2 and 4 were in a Constellation. For clarity, we didn’t redraw those relationships, but they are still in place (Constellations 1 and 2). In this diagram Business Entities 3,4, and 5 are in Constellation 3. Business Entity 5 is acting as a marketplace, either creating its own assets, or housing other assets that ownership has been transferred to.

Summary

Core to the DCA is the fluidity of where work happens and how trust is established between entities. Cloud vendors have a dizzying array of tools and services and it’s becoming more common for organizations to be multi-cloud, taking advantage of technological and economic differences. The Data Cloud Architecture is a way to pull all of these things together and operate “above” specific vendor implementations, moving the work to the data OR moving the data to the work. As technology advances (GenAI, GPU, etc.) where your data lives and moves and where your custom code operates needs to be architected, not an afterthought. Individuals or teams that operate as silos can be extremely effective but experienced leaders know the sum is always greater than the parts, and creating mechanisms for collaboration is one of, if not the best way to innovate. A good architecture supports the business’s ability to execute on business initiatives and allows IT to partner with the business delivering new capability in days, not months or years. Well architected means an architecture you can build upon, change overtime, and incrementally add capability to. The Data Cloud Architecture allows you to set your own boundaries/scope (internal, partners, customers) and expand as the needs of the business changes. The world is not slowing down, IT needs a modern data architecture, built on public infrastructure, enabling the democratization of assets, opening up new revenue streams and improving operational efficiency while maintaining trust.

--

--

Kevin Bair
Data Cloud Architecture

Data guy with 25+ years of experience working the Data Management platforms and architectures to solve business challenges