Data Mesh — Alternatives

Data Lakes & Warehouses are not your foes.

Dr. Marian Siwiak
Between Data & Risk
4 min readNov 28, 2022

--

This article discusses some examples of alternatives to Data Mesh decentralization and shows in which cases Data Mesh is not the best option. It also revises the relationship between Data Mesh, Data Lake, and Data Warehouse. The article is an extract from “Data Mesh in Action” by Manning Publications, the first book on the implementation of the Data Mesh paradigm, which I co-author together with Jacek Majchrzak, Sven Balnojan, Mariusz Sieraczkiewicz.

Cover of “Data Mesh in Action” by Majchrzak, Balnojan, Siwiak and Sieraczkiewicz (Manning).

Data Mesh definition and need

The Data Mesh is a decentralization paradigm. It decentralizes the ownership of data, the transformation of data into information, and data serving. A more thorough definition of Data Mesh can be found in this article.

We believe that the data world is in need of decentralization of data in the form of the Data Mesh. Some of the reasons were mentioned previously. However, Data Mesh is not always the best solution. In many cases, centralization is a sensible default option.

If not Data Mesh, then what?

Centralized data work both organizational and technical can make sense as a default option. Decentralization does carry costs, and centralization can mitigate those. That does imply though, that the value derived from centralized and decentralized data is roughly equal.

There are two main alternative models to Data Mesh’s decentralization of responsibility for data.

The first option is the centralization of both people and technology. This is the default setup for any start-up. And it’s a very decent default option, just like the monolith is a decent default option for any software component. In the beginning, the costs of decentralization outweigh its benefits. The benefits brought in by working closely together inside one data team, having just one technology to use, makes things a lot easier.

The second option is the idea of splitting up the work not by business domains as the Data Mesh suggests, but by technology. This usually results in one core data engineering team responsible mostly for ingesting data and provisioning a data storage infrastructure and multiple other teams, analytics teams, data science teams, analysts you name it. These pick up the raw data and turn it into something meaningful down the road. You might first centralize your data system and then layer up with this option to increase the flow.

When to switch to Data Mesh?

There is nothing wrong with the above-mentioned centralization options. They might be reasonable default options, but both options fail to align with value creation, which is deeply tied to business domains. Neither are able to address sudden changes in just one business domain.

As with microservices, where the strength is the ability to quickly extract value from one specific service by scaling it up all by itself, the Data Mesh is able to scale up value extraction in just one domain. All other alternatives need to scale up everything to scale up value extraction in just one domain.

So in one way or another, both of these alternatives will hit a wall at some point in time, in which adding the next data source, or adding the next data science project will feel increasingly complex and costly. That is the point where you want to switch to a Data Mesh.

Data Lakes & Warehouses are not Data Mesh alternatives

There is a misconception about the Data Mesh. It is sometimes perceived as an exclusive alternative to the central Data Lake or the central Data Warehouse.

This misconception, however, does not take into account the fact that Data Mesh is a combination of two things:

  • technology
  • organization.

The Data Mesh is an alternative to having one centralized data unit taking care of the data inside a central data storage.

Data Lakes & Warehouses inside the Data Mesh

If Data Mesh is understood as described above, its definition becomes very inclusive. In particular, there is still an option to have central data storage and decentralized units working and owning the data. Indeed that is a common implementation in companies that do not need complete flexibility on the data producers’ side.

It also is a common approach to keep Data Lakes and Data Warehouses inside a business intelligence or data science team. The Data Lakes and Data Warehouses then become a node inside the Data Mesh.

Data Mesh can still use Data Lakes, e.g. a data science team building data products may use Data Lakes as nodes within the Data Mesh.

To sum up, Data Meshes can make heavy use of both Data Lakes and Data Warehouses. And since Data Meshes in general do not try to focus on any specific technology, these Data Lakes and Warehouse may be in various formats.

This was an extract from “Data Mesh in Action” by Manning Publications.

To learn more about Data Mesh and the book check this episode of the “Between Data & Risk” podcast, which I host together with Artur Guja.

--

--

Dr. Marian Siwiak
Between Data & Risk

Your friendly neighborhood Data Guy. Co-author of "Data Mesh in Action" by Manning. Co-host of "Between Data & Risk" podcast.