Observability in the context of Data Mesh — Part 1

P Platter
Agile Lab Engineering
7 min readJan 23, 2024

Within the decentralized paradigm of Data Mesh, trust among various actors is a fundamental element of the paradigm itself. This is why the need to introduce a concept of observability for each data product has been coined, precisely to convey transparency and trust to all consumers. In this article, however, we will delve deeper into the concept and see how this is of fundamental importance also for many other aspects within the Data Mesh. In the original storytelling (by Zhamak Dehghani), the concept of observability remains very abstract and high level. Here, we aim to go deeper with two articles. In this first article, we will see the conceptual/theoretical part, while in the second part we will delve into the architectural/implementational aspects.

At a high level, observability allows an external observer to understand the internal state of the Data Product (DP ), which by design should not be exposed (except for the external interfaces set up, such as output ports, etc.). Observability is, in fact, one of those exposed and standardized interfaces that allow interacting with a DP without having to confront the internal complexities of the same.

Observability

Areas of Use:

First, let’s explore in which contexts observability plays a role and brings value within a Data Mesh ecosystem:

1) Trustworthiness
DPs expose data through output ports, which are effectively data contracts defined by the data producer that represent a promise on how data will be provided to consumers. These promises typically cover 4 areas: technical information (schema, data types) and semantic information (business information and meaning), which we can define as static information, i.e., that does not change every time the data changes (unless there are breaking changes, but let’s simplify); then there are SLAs and data quality which are dynamic, i.e., they need to be continuously verified (paradoxically even when the data does not change). If we want a data consumer to trust what we offer, we must provide them with visibility and transparency regarding this information, in a word, we must make the DP observable. It is not enough to provide the current state of the system, but we must also demonstrate that our DP has a good track record of reliability (few data issues, low mean time to recover, etc.).

2) Scheduling and Orchestration
The Data Mesh professes an ecosystem where all DPs are independent in terms of change management, but this independence is not so obvious in terms of business processes, which are often strongly interconnected. Imagine having DP-A and DP-B, and at the process level, DP-B needs the updated data from DP-A to perform its function. In this case, DP-B must observe what happens in DP-A, understand when data is refreshed, understand when its scheduling chain is actually finished to not consider partial refreshes or other, and before consuming the data must understand if it is in good shape. These pieces of information are significantly different from the previous ones and must necessarily be machine-readable to allow DP-B to make autonomous and real-time decisions on how to proceed with its orchestration. In any case, even here it is about making the internal state of the DP observable, if before it was exclusively related to the data state, now it is more related to the internal process state.

3) Audit and Compliance
Audit and compliance functions perform a cross-cutting and strongly centralized activity. They are usually interested in receiving information about when, how, and who performed certain operations. For concrete examples, they want to know who and when a deployment was executed, who deleted data, who restarted the scheduling of a data product, etc. In short, every operation (automatic and manual) that is carried out on a DP is subject to audit. One way to democratize this information is to standardize this set of necessary information, make it mandatory in the DP implementation cycle, and finally make this information accessible and observable. These pieces of information are also part of the concept of observability, as they allow third parties to know something that happened or is happening inside the DP, standardizing the communication interface.

4) Process Mining
Similarly, the function that deals with analyzing a company’s business processes (process mining) needs to understand the timing of the various phases of each business process. When a business process is directly implemented within

data pipelines or data transformations, it becomes crucial to be able to observe the various processing steps, the timestamps with which they were executed, and possibly also the processing times of the individual informational unit (such as the processing of a payment, a customer profile, etc.). Observability can standardize this type of information and make it usable, so that it is possible to perform process mining calculations through processing chains that span several data products.

5) DP Operations
Gaining visibility into what happens inside the DP is also very useful for the DP team itself. This is because, as DPs are often created and delivered by a self-service layer, it might become difficult for the DP team to understand where the jobs are actually executed, what the URLs are for monitoring them, and where we are in the scheduling chain. This information is vital for observing the behavior of a DP, troubleshooting, and then intervening with operations (through control ports) to fix various types and sorts of problems.

6) FinOps
Finally, if a company wanted to embark on a path of adopting the FinOps practice (remember, it is a practice), one of the first steps would be to establish a mechanism for showback or chargeback of costs for each team. The goal of this operation is twofold: to increase the DP team’s awareness of resource use, and then to give visibility of this information to the FinOps team ( the central team running enabling the practice ) so they can carry out impact analysis, suggestions, and evolutions of the FinOps platform. In this case, we are talking about making “internal” information of the DP, in this case, billing, available to the outside world, i.e., making it observable. Another important case is when in the implementation of Data Mesh it is decided that part of the costs of a DP are charged to consumers, so it becomes essential to demonstrate that the amounts charged to consumers are derived from real costs.

Characteristics:

The observability structure within a Data Mesh architecture must be designed keeping in mind the following characteristics. Below, we delve into the conceptual aspects, while in part 2 of this blog, we will see concrete examples of implementation.

1) Extensible and Customizable Protocol
All the use cases explained above are not a zero priority, so they will be implemented along the way as the Data Mesh takes hold within the organization. Even a single functionality can start with a simple MVP and then evolve and become more complex over time. However, this does not mean that the architectural mechanism you will use for your observability does not need to anticipate this evolvability from the outset; indeed, it absolutely must. My advice is to use a mechanism based on REST APIs to implement observability so that API versioning can be used, and it is possible to structure the paths in a way that allows new features to be added along the way. Even at the API contract level, i.e., the content that will be exposed, it must be designed in such a way that it is extensible over time. I recommend designing your standard for data exposure within observability and not adopting the APIs of any platform or tool, as this will become a lock-in in the long run and will not guarantee the necessary evolvability.

2) Provides Current State, but Also Statistics and Historical Depth
The observability metrics exposed by a DP must allow for understanding the current state of the system but also provide historical evidence that that DP is reliable and reliable. These two aspects must be part of the standardization of the contract.

3) Non-Tamperable, Certified
It’s important that observability information is the result of a structured and standardized process, non-tamperable, and not modifiable post-facto by the owner of the DP or other entities. On this aspect, the platform plays a fundamental role.

4) Reliable for the Ecosystem
If the observability mechanism becomes an integral part of the value chain that is established between the various DPs, it is important that it be very reliable because if poorly designed, it could become a single point of failure with mission-critical impacts for the entire ecosystem. Therefore, it is crucial to create an observability architecture that provides the right trade-off between centralization/decentralization (we will delve deeper into this topic in part 2) and operational independence of the individual DPs.

This detailed exploration underscores the critical importance of observability within the Data Mesh paradigm. Observability not only fosters trust and transparency among the various actors involved but also enhances the overall functionality and efficiency of the data ecosystem. By integrating observability into the Data Mesh structure, organizations can effectively manage and monitor their data products, ensuring data reliability, compliance, and operational excellence. The concept goes beyond mere visibility, embedding a systematic approach that is essential for modern data architectures and strategies.

Agile Lab

--

--