Data Mesh: To clear the Mess ?

Subash Prabanantham
4 min readSep 25, 2022

--

What is Data Mesh ?

A definition from Thoughtworks (From Zhamak Dehghani, who coined the term)

Data Mesh is an analytical data architecture and operating model where data is treated as a product and owned by teams that most intimately know and consume the data.

There are 4 concepts which forms a Data Mesh, explaining in the simplest terms,

Domain Ownership

A decentralised domain specific data producers who are an expert of their domain and understands the data better than others. They are responsible for ETL data pipelines, data quality and choice of design patterns which they can incorporate to achieve the transformations.

Data as a Product

Every decentralised team / domain should treat their data as a product. They should be well positioned to determine what as well how data is useful for analytical purposes and make it available as a product to consume. The final data could be tabular, graphical or a real time feed which depends on the nature of data.

Self Serve Infrastructure / Data Platform

Self serve infrastructure is an horizontal function which helps in providing tools and services to execute the ETL or transformations required by domain owners. This will be a one-stop platform through which standardisation, lineage and data discoverability are enforced across domains.

Federated computational governance

Though the teams are separated by domains, we need representatives from each team to provide governance across the organisation with respect to schema structure, SLA definition, data contracts, privacy, encryption, data at rest principles etc. The resultant principles are governed through self serve data platform.

Data Mesh : Is it a technology or an ideology ?

Data mesh is NOT a technology. It’s an architectural pattern which not only provides a paradigm shift in how to design the system, it fundamentally changes the way teams are structured in an organisation.

Zhamak explains it with two distinctive functions of data

  1. Operational data: OLTP use case
  2. Analytical data: OLAP use case

Currently, many organisations treat these two functionalities as a separate team/organisation. But with data mesh, domain specific teams are responsible for both.

Zhamak’s example on data mesh (Spotify or similar company)

Zhamak’s explanation on domains and data as a product

With data mesh, a team would be comprised of software engineers, data engineers/analysts and possibly data scientists all working on building both operational as well as analytical data products. Whether it’s a flat structure or an hierarchy based on functional streams is up to the teams to decide.

Another example of Data Mesh structure,

E-commerce example of Data Mesh

In recent times, there are many data related architectures and jargons have evolved such as Data Warehouse (not recent :D), Data Lake, Lake House etc. Now, it’s Data Mesh.

When you look at the idea, it’s not something new! The ideation is derived from Domain Driven Design (DDD), which is in circulation for a while in micro-services architecture. But, it’s usage for analytical workloads is something novel about this approach.

Typically, in an organisation we will have a centralised data engineering teams responsible for data ingestion and availability for different stakeholders. Few bottlenecks of this approach are — multiple layers of dependencies from source data, trust on data quality and not satisfying data availability criteria.

Following are my observation on adopting Data Mesh,

  • Distributed Data Mesh will definitely reduce data duplication providing efficient usage of storage and computational cost
  • Defining domains is an integral part of distributed data mesh, adding/changing domains in an organisation should be fluid too!
  • Making operational and analytics data team to work as a single unit is not an easy shift for an organisation of any size
  • It’s not a mature model yet! The idea is in a nascent stage, few companies have adopted it (with few alteration in principles) but we got to wait to learn more
  • I like the concept of decentralised domain teams, which will help engineers of different skillset to learn from each other
  • Recently, I see many commercial products have started stating as it supports Data Mesh — Could it be a sales pitch ? or an endorsement of this approach ? Yet to see!

New data architectures proves that this space is still evolving and a gap exists in an organisation. One size doesn’t fit all! Said that, I see a potential of this concept being implemented in an early-mid sized companies to get more knowledge and learnings of the structure to tune it further.

--

--

Subash Prabanantham

Engineering @ { PayPal ›› Visa ››  ›› Metica [Startup] } ; Yet Another Indian Software Engineer ;