Alchemesh: Data mesh framework — The genesis

Published in

Alchemesh

8 min readMay 29, 2024

As data becomes increasingly vital in decision-making processes, many companies are rethinking their organization to embrace data. Over a serie of posts, I have discussed how I transitioned from my though from the Modern Data Stack to the principles of Data Mesh, which ultimately led me here, at the start of a new journey: Building a Data Mesh framework.

Data mesh is a decentralized sociotechnical approach to share, access, and manage analytical data in complex and large-scale environments — within or across organizations, promoting decentralized data management while ensuring robust governance and a product-centric approach.

However, implementing a Data Mesh presents numerous challenges and requires platform support.

Data Mesh: Beyond Technology

Contrary to popular belief, Data Mesh is not just about reorganizing teams. It’s not simply about forming cross-functional teams working on a centralized and monolithic platform. Data Mesh represents a profound transformation in the interactions between people, the technical architecture and solutions in the organization, based on 4 principles:

Domain ownership: Decentralize analytical data ownership to business domains closest to the data source or main consumers, and manage the data lifecycle independently based on these domains. This approach aligns business, technology, and data, enabling scalability, agility, accuracy, and resiliency by reducing bottlenecks and ensuring localized change management.
Data as a product: Domain-oriented data is shared as a product directly with data users, adhering to characteristics such as discoverability, addressability, understandability, trustworthiness, native accessibility, interoperability, composability, intrinsic value, and security. Each autonomous data product provides explicit, easy-to-use data sharing contracts and is managed independently, introducing the concept of a “data quantum” that encapsulates all necessary components for data sharing, aiming to prevent data silos, foster a data-driven culture, and enhance resilience to change.
Self-serve data platform: Empower cross-functional teams to share data by managing the full lifecycle of data products and creating a reliable network of interconnected products, simplifying data discovery, access, and use. It aims to reduce the cost of decentralized data ownership, abstract data management complexity, engage a broader range of developers, and automate governance for security and compliance.
Federated computation governance: A federated governance model with domain representatives, data platform members, and experts to balance domain autonomy and global interoperability, relying on automated policy enforcement. It aims to derive value from interoperable data products, mitigate decentralization risks, integrate governance requirements, and reduce manual synchronization overhead.

Supporting the Transition to a Data Mesh

Implementing a Data Mesh is a complex and evolving process. Companies must not only initiate this transition but also ensure its sustainability. As new technologies emerge and organizations mature in their Data Mesh implementation, concepts and practices must evolve.

Data Mesh is far from a static solution. It must continually adapt to new reflexion and technological advancements. Companies adopting this approach must be prepared to continually review and adjust their practices and tools.

Many challenges to address

When you start to deep dive in Data Mesh implementation, you start to realize the every growing number of challenges that you will have to deal with, like:

Data Contracts: These become crucial for formalizing dependencies between teams and their products. Data contracts clarify expectations and responsibilities, ensuring effective communication and collaboration.
Polysemes: These elements enable different data products to communicate using common entities, facilitating interoperability and data consistency across the organization.
Data Products: At the core of Data Mesh, these data products must be properly documented, maintained, and owned by teams. This includes defining metadata, quality standards, and mechanisms for updates and versioning.

Challenges of Autonomy

While essential, team autonomy inevitably leads to divergences in the technologies used and the best practices adopted. Some might be tempted to re-centralize decisions via a single platform / technical stack (e.g. a DBT project with an Airflow instance). However, this might simply shift the problem to the platform level. It’s crucial to accept and support this autonomy by defining clear interfaces for data products and providing a platform that fosters this dynamic.

This technological diversity can be seen as an asset if well managed. Allowing each team to choose tools that best meet their specific needs encourages innovation and adaptability. However, it is essential to establish standards and best practices to ensure consistency and interoperability of the implemented solutions.

Our Vision: A Framework for Data Mesh

Given these insights and building upon my previous discussions about transitioning from the Modern Data Stack to Data Mesh principles, I decided to develop a framework for Data Mesh governance. The goal is not to offer an all-in-one product but to provide a flexible and modular tool. The framework aims to:

Standardize Interfaces: Offer a common working framework for data domains, data products, output ports, data contracts, etc., thus facilitating acculturation and understanding.
Support Platform Teams: Assist in creating self-serve data platforms through component standardization while remaining implementation-agnostic.
Provide Modular Components: Supply “Lego-like” platform components, allowing users to chose have ownership and autonomy on how your team want to translate the data mesh resources in platform.

This framework is designed to be modular and adaptable, enabling companies to use it according to their specific needs. Whether it’s standardizing processes, supporting teams, or offering modular solutions, the framework aims to provide a solid foundation for implementing and managing a Data Mesh.

Alchemesh: Layers

The Alchemesh framework will be composed of three layers:

Alchemesh console

Responsible for providing interfaces (UI, Rest API etc.) to manage the data mesh metadata:

enables users to navigate the data mesh,
allows platform teams to translate all this into platform provisioning.

This will be the portal for actor working with data product:

Act as a data product registry
Interface for data product developers
Interface for data platform teams to activate the self-serve data platform

Alchemesh controller

This will be the data mesh control plane, that will drive the data mesh platform. It creates the link between data mesh metadata managed by the console and the platform component, in an automated and self-service maner.

Alchemesh platform components

Set of ‘LEGOs’ Packaging Platform Components for Self-Service. The platform components are divided into several categories:

Infrastructure Platform Component: Defines the platform foundation to support the data mesh (e.g. Cloud provider project/account, VPC, registries, Kubernetes cluster, etc.).
Output Port Platform Component: Instantiates storage components on the infrastructure to expose data produced by data products, ensuring interoperability and access management.
Input Port Platform Component: Instantiates components to consume data from operational systems and makes it available to the data product infrastructure, allowing the associated code to format it and produce the value of the data product.
Code Platform Component: Instantiates the business logic on the infrastructure, enabling the use of incoming data to produce the desired outcome.

Open source

It’s not yet clear exactly what strategy we’ll adopt on this project as far as open source is concerned, because it’s far from clear where this initiative will go, it’s still a side project that’s close to our hearts. But we owe so much to the open source community that make us grow, and we feel happy to be working with so many different people as we did on NiFiKop, that some of our work will be open, for sure!

Modularity

Each of these three layers can be used independently and partially!
With the possibility of replacing each of the solutions with custom, depending on the use to be made of each:

The console part can be used as a metadata layer for the data mesh, then consumed and controlled via interfaces (Rest, GraphQL, Events)
by the company’s platform teams to integrate it with their automation systems (CI/CD, GitOps controller, Kubernetes controller, etc.)
to create the link between the mesh metadata and the platform.
The controller must be able to drive the platform components offered by Alchemesh as well as those produced by the organisation using the solution.
The platform components do not need to be specialised to meet the requirements of Alchemesh or even just Data Mesh.
They can be used outside this framework like any other module. For example, if I have an infrastructure component that allows me to create a GKE cluster via Terraform, it must be usable to create a GKE cluster in a traditional Terraform enterprise environment without having to use the console or controller, and the same goes for an output port to manage storage and data for an output port to manage storage and access rights on BigQuery.

Conclusion

Data Mesh represents a profound transformation in data management, requiring collective commitment and decentralized organization. With this framework we want to build, we aim to support companies in this transition by offering standardized tools and interfaces while supporting team autonomy. We want, at our level, to participate in the emulation and reflection around Data Mesh to try to advance the thinking to fully leverage the benefits of Data Mesh, successfully navigating the challenges of its implementation.

We are still in the early development of this framework based on our interpretation of Data Mesh. Implementing a product also gives us the framework to grow our reflection, starting from core concepts (e.g. data domains, data products, data contracts, etc.) to enriching them with features promoted by Data Mesh to enable it at scale (e.g. computational policies, feedback loops, etc.). This serie of articles will allow us to share our reflections and decisions we had in parallel with the development!

In the next article, we will focus on the north star architecture we currently have to develop this framework and then present to you the resource modeling (data products, technical teams, etc.) we have for our MVP!

To tease our product a little, here are some sketches of the AlchmeshIo console. 😇