The hidden side of Data Mesh

Published in

GlobalLogic LatAm

8 min readJul 25, 2024

Header image. A decorative representation of a graph with a magnifying glass.

Introduction

Before starting with the article, if you don’t know what Data Mesh is please read this document [https://martinfowler.com/articles/data-mesh-principles.html] to take a quick look about Data Mesh philosophy and how it works

Some of the most interesting tendencies related to Big Data was Data Mesh, an holistic approach to move the data management from a centralized approach to a democratic access and ownership of the data.

Many enthusiasts have given up relabeling the idea as “Data Mess”, Data Mesh is losing momentum, and some analysts think that Data Mesh can be replaced or disappear in the short/medium term. What is happening?

Last year Gartner published their annual report about Data technologies where it identifies Data Mesh as “obsolete before plateau”. It means that the technology will likely not adopted by the mainstream before it becomes obsolete.

Gartner graph. It shows a curve that is the representation of how the technologies evolve from their creation until they are adopted by the industry. Data Mesh is marked as obsolete by Gartner before reach the massive adoption.

Data Mesh goals and promises sound like a silver bullet for the medium and large organizations that want to move forward to an improvement of the data to solve the natural bottlenecks that we can find in centralized approaches based on data lakes or data warehousing. But just like any other technology or methodology, Data Mesh has negative points that are not always discussed in the documentation that we can find on the web.

In this article we will check what these points are and why .

Pain points.

The nature of Data Mesh presents several challenges that are not always easy for organizations to address. These challenges were rediscovered on the fly by the organizations that needed to stop the migration to Data Mesh and then stopped the adoption of this meta architecture by others.

We need to consider these challenges and the status of the organizations before analyzing the possibility of adopting Data Mesh.

Costs.

The cost of starting a Data Mesh implementation is high, that is the first barrier that we will find and that is a big one. But later we have the operational costs that will grow until they get stabilized but that keep themselves high.

The decentralized nature of Data Mesh involves a different cost management in contrast to other kinds of solutions. It’s difficult to keep the costs under control, several aspects increase the migration and maintenance costs:

Infrastructure costs: The distribution of the data and the creation of data products requires the evolution of the infrastructure to support the new data (meta) architecture. This can include the replication of cloud stacks or the acquisition of new hardware in case of on premise infrastructures.
Software Licenses: Each data domain could require different software stacks, also the replication of the accesses and maintenance of data products can trigger the licensing costs.
Personnel salaries. We need to move the experience and know-how related to data management from a centralized team to a cross-functional teams approach, where each one is responsible for different data domains. New data product developers, testers, owners, etc are required to form those teams.
Training costs. Each new data team will need time to get skills and specific knowledge for the different data domains. This point includes pay training, certifications, internal knowledge transfers and adaptation time.
Data Privacy and Compliance. Data handling in a distributed environment like Data Mesh is complex and costly, each domain must be aligned with the privacy laws and keep a set of common guidelines that aligns the compliance effort of all teams.
Failure to invest. A successful implementation of Data Mesh is a long journey that needs investment in terms of time and money. A common failure is a poor (or no) investment in automated testing, data quality or documentation. Something that can look easy to address later can be transformed into an unstoppable snowball that impacts the implementation.

These costs are a small sample of the costs that we can find in a Data Mesh implementation. Managing costs in a Data Mesh implementation is complex, finding the right balance in cost allocation is crucial but difficult to achieve.

Organization politics and cultural challenges.

Scaling data mesh requires both architectural and organizational considerations. From an organizational perspective, a Data Mesh (and moving the focus to be a data-driven organization) requires a cultural shift. You’ll be cultivating both sides of the supply and demand for data: data consumers establish the product requirements that the domain teams must satisfy. That’s a tall order.

An organization that wants to implement Data Mesh needs to get a time of introspection. Data Mesh has a high dependency on the culture of the organization and how open it is.

Under this point we can see challenges like these:

Risk of isolation. Domain ownership of data presumes a fondness for data on the part of most IT professionals, which is just not there. While the product idea sounds immediately compelling to managers and consultants, the field experts who obtain and create data have specific things to do that usually fill each workday all by themselves. That kind of situation avoids helping a researcher that other departments might find her data useful.
Conflict of Interest. Ensuring that each domain takes full responsibility for its data can be difficult, especially in organizations where data ownership has not been well-defined. Domains might prioritize their own goals over the broader organizational goals, leading to conflicts.
Past Failures. Previous failed data initiatives can create skepticism and political resistance, as stakeholders may be wary of another disruptive change.

Data quality.

One of the keys of a successful Data Mesh implementation is to ensure consistent data quality across data products owned by different teams, where each one is having their own domain and context. On one hand not all data is valuable or relevant, some data might be toxic like biases, copyright infringements, leak of personal information, etc. On the other hand the valid and valuable data needs to be cleaned, evaluated and tested in an automated way.

The coordination of the effort involved in the implementation of automated testing may be greater than expected, we need to include business domain subject matter experts and data consumers additionally to the QA roles of each team.

Diffuse Data lineage. Keeping track of data lineage and provenance in a distributed system can be complex, making it harder to ensure and verify data quality. Providing transparent data lineage across domains is essential for trust and quality assurance but can be challenging to implement.
Data Consistency. Ensuring consistent data standards across different domains can be challenging, leading to discrepancies in data formats and definitions. Achieving clear interoperability between data products from different domains requires meticulous standardization and coordination.
Quality Assurance. Ensuring a high Data Quality level requires robust quality assurance processes across domains. Maintaining that level as the volume and variety of data scale across multiple domains can be challenging.

Lack of Data Product evolution.

The different teams should have the compromise to keep their Data Products healthy and up to date. But this compromise isn’t enough, Data Products should evolve with the business and this point isn’t always covered from a data perspective.

The human bean has the natural tendency to repeat the patterns that seem to be working fine and this provokes the avoidance of necessary risks at the time we need to test new approaches, divide a problem to conquer it, or simply discard something that was helpful but now is a load for the organization. This is a cultural problem that impacts the evolution of data products and the necessary investment to work on it.

In a scenario of concentration of information and lack of evolution, Data Products can become information silos that are difficult to decompose and lead to duplication of efforts. We can rest assured that we are working with high-quality data, but at the same time that data could be concentrated at a single point, forming a silo.

Conclusions.

We have covered some of the most important challenges related to Data Mesh implementations, and why the industry is thinking that Data Mesh is in declive but, why is it still recommended?

Data Mesh remains recommended in specific contexts due to its innovative approach to data management, potential to solve key organizational pain points, and evolving ecosystem of support. Those organizations that have clear goals, enough budget and a time working on data are good candidates for adopting Data Mesh, also it is helpful for those small organizations that are data driven and keep a small size enough to adopt Data Mesh without much effort.

Organizations that can’t afford the effort push to try to get alternatives to Data Mesh trying to get the same benefits. The constant evolution of Big Data solutions and architectures has promoted newer solutions that came to solve the same problems that Data Mesh but without suffering the same challenges.

Some incoming approaches that are covering (almost) the same concepts covered by Data Mesh:

Data Product Canvas. It is a framework designed to help teams design, develop, and deliver data products. It provides a structured approach to defining and understanding all aspects of a data product, including its purpose, stakeholders, data sources, quality metrics, and delivery mechanisms. It focuses on individual data products, providing a framework for their design and development instead of the Data Mesh architectural perspective.
Data Fabric. This approach creates a unified and integrated view of the data across the organization. Their main goal is to make the data easily accessible, discoverable, and organized in a way that makes it easy to combine and analyze. In this way Data Fabric wants to integrate disparate sources and provide a centralized, holistic view of an organization’s data assets. This contrasts with Data Mesh’s decentralized data ownership and architecture, but both aim to solve the same range of challenges

The incoming innovation pushed by the new AI based technologies (like LLM and Gen AI) will give us more data approaches. Data mesh isn’t the solution; it’s a strategy. It’s a framework for making decisions, it has marked what we can expect from an organic and distributed architecture, opening a path to new discoveries that would not be possible without the experiences provided by these years of this data approach.