What is Data Mesh?

Per Axel Aamot
Sep 10 · 8 min read

Introduction

The term ‘Data Mesh’ has become increasingly common over the past few years, though it is still somewhat niche compared to terms like Data Lake or Data Platform, and what it actually means might not be well understood. In this article I will delve into what data mesh actually is, the motivations behind it, and how you can start implementing it into your organization.

Figure 1 — Google Trends interest past 5 years

Background

During the past decade, particularly with the advent of cloud computing, many IT projects, especially in the BI/Analytics space, have struggled to realize the benefits or solutions that they set out to achieve. The projects might have taken longer or been more difficult than initially envisioned, the solutions might not have attained the required quality (particularly data quality), and existing problems around data trust and governance might still cause problems (and in some cases have become exacerbated by a move to the cloud). At the same time, becoming data-driven is among the top strategic goals for organizations today; making better decisions, providing better and more customized customer-services, reducing operational costs and giving employees powerful insights into their business is key to success, and CEOs know this is what their competitors are trying to achieve.

Rationale

Imagine going to your favorite web-shop, for example for ordering food, and having to contact the individual producers of food for access, having to spend a lot of time figuring out what the products are or how they were produced. This wouldn’t work. But for getting access to and consuming data today this is to varying degrees what goes on every day in many small and large organizations. These sorts of problems are one of the main things data mesh seeks to resolve.

So what is data mesh?

Data mesh is a set of practices and paradigms with an overall goal of allowing data consumers to focus on consuming data, rather than interpreting or understanding their characteristics. It is a new way of looking at data in an organization; from governance and quality to lifecycle. Data Mesh is NOT technology or a product. And it is still evolving and being developed. There is no real reference architecture at the moment, and implementing it is nontrivial and time-consuming.

Figure 2 — Changes in sources or domains over time vs coupled data warehouse pipelines supporting analytical needs

Vision

One of the core tenets in the data mesh paradigm is a division between producers and consumers of data; buyers and sellers of data products. The idea is that sellers of data products will be more engaged, more invested, in the quality and consumption of their products.

From theory to practice

So what does this mean in practice?

Thinking data products

As mentioned under “Background”, product-thinking has developed significantly in the software engineering discipline. So what does product-thinking mean in the data warehouse/analytics space? How do you approach this area?

First Steps

The following first steps is a good place to keep in mind if you want to start moving towards data mesh or if your organization just wants to try to get the best benefits from the practices;

  • Identify one business area to start with. This should be a business area or function with high importance for the business and low technical complexity. Finding the right balance here can be challenging.
  • Reduce the number of technologies involved
  • Infrastructure should/must be “as-a-Service”
  • Start small
  • Begin developing templates for how to produce data-products
  • Implement versioning of data sets (particularly important around source-oriented domain master-data like like codes and vocabularies)
  • Start measuring data quality immediately; you can’t act to resolve issues if you are not aware of them.

Conclusion

Data Mesh is still somewhat in its infancy. There is no reference architecture one can look to, but the underlying ideas, motivations and goals are well understood. Technology is available to start moving towards data mesh solutions, but the challenges will typically relate to cultural, organizational and cost (time) issues.

Grensesnittet

Knowledge sharing in focus!