Balanced Data Architecture

Emanuel Kuce Radis

Published in

The Good CTO

3 min readNov 18, 2023

Back to Main Article

Embracing a Versatile Governance Model

Our architectural ethos champions versatility, where the governance model is an amalgamation of various technological capabilities, not confined to a single stack such as Databricks Lakehouse. For instance, components from Azure’s data universe may complement data fabric solutions, providing a rich palette of tools for different needs. This diversity ensures that the technology stack is robust, flexible, and capable of supporting a wide range of data products.

Integration of Lakehouse and Data Mesh Principles

Central to our framework is the fusion of Lakehouse’s maturity in handling vast data lakes and the agility of a data mesh approach. The Unity Catalog serves as one of many tools that could be utilized for metadata management, facilitating a centralized yet flexible approach to data governance. This catalog, alongside other tools, supports the principles of a data mesh by enabling cross-team data discoverability and empowering squads to share their data assets securely and efficiently.

Medallion Architecture and Distributed Data Product Development

Each product-driven squad manages its own Medallion Architecture within this interconnected environment, retaining the independence to process and refine its data while benefiting from the shared streams of silver and gold datasets. This ability to access and contribute to a communal data ecosystem exemplifies the data mesh concept, where decentralized nodes of data products form a cohesive network.

Informed Decision-Making through Data Lineage

Data lineage tools contribute to the informed decision-making process by offering transparency in the data’s journey. This clarity is crucial in a mesh environment, allowing teams to make educated choices about which datasets and versions to utilize in their data products.

Model Serving, MLOps, and Continuous Refinement

Model-serving APIs represent just one facet of the architectural model, providing a means to deploy and interact with machine learning models efficiently. In conjunction with an MLOps strategy, which may include elements from Databricks or other MLOps frameworks, squads can continuously refine and enhance their models, leveraging the latest in reinforcement learning and AI.

Data Products as the Driving Focus

Throughout this architecture, the language of data products remains at the forefront. The technology serves the product, not the other way around, ensuring that all architectural decisions are made with the end goal of delivering value through data products. Each squad, while autonomous, is an integral part of a larger organizational data mesh, sharing APIs, structured streaming, and data shares, thus weaving a seamless tapestry of data connectivity.

A Cohesive Data Ecosystem

By adopting a hybrid model that combines the governance of a lakehouse with the operational flexibility of a data mesh, we foster an ecosystem where data self-servicing is the norm. This approach not only democratizes data across the organization but also ensures that the creation and evolution of data products remain the primary focus. Each nucleus team contributes to a dynamic, innovative environment where data products can thrive and provide value to all stakeholders involved.

Next Chapter Teaser: In “Implementing Data Product Strategies,” we’ll uncover the practical steps necessary for actualizing these architectural principles.