Data mesh and its challenges
Data mesh is one of the sensations in data analytics and it found its way into the Data world in 2021.
I am using Data mesh for the last 18 months and would like to share my understanding of data mesh, its promises, and premises along with the challenges I can sense at a granular level.
Several organizations have started experimenting with self-service platforms to enable teams close to the business (the original development teams of the OLTP systems) to support the business better by having them own their own data.
Today’s landscape is divided into operational data and analytical data
The current state of technology, architecture, and organization design is reflective of the divergence of these two data planes — two levels of existence, integrated yet separate. Data mesh is all about bringing these two-state as closer as we can.
Data mesh has four core principles:
1)Domain-oriented decentralized data ownership and architecture
2) Data as a product
3) Self-serve data infrastructure as a platform
4) Federated computational governance.
Promises by Data mesh:
•The promises of the Data Mesh approach include:
•Better data quality as the business owns the data as a product and is driven to ensure it remains high quality.
•The analytics data is back with the domain-oriented IT teams who understand the data better and are close to the business.
- The access control models in the OLTP systems can more easily be mirrored into the storage for the analytics systems.
Challenges I can think of:
I have been designing data warehouse solutions for more than15 years and have been designing Data mesh for the last 2 years. Based on my experience I can see some challenges.
To be part of the Data mesh, each domain must follow a set of guidelines and standards that describe how their domain data will be managed, secured, discovered, and accessed.
1) Most of the time domains only think of their own data product and not how to work with other products, possibly making it difficult to combine the data from multiple domains.
2) Every domain may not have the data engineering capability to implement the data system, forcing people to do this work may affect their work experience. You may have domains not wanting to deal with data. For example, you can’t expect the sales team or HR team to manage their own data if they don't have a data engineer, then you have to create a central team to look after them.
3) Each domain can use different technology or coding standard. Ensuring every team should know and follow CI/CD, coding, cloud tech may be a hassle.
4) Either Confirmed dimensions would have to be duplicated in each domain or a central team will have to manage master data management.
5) One domain’s priority may not match with others, so there can a lag in reporting. like how to make sure every team has the same OKR in the same time frame
6)You need to coordinate if any domain changes its data model. Can every domain keep track that who is using their data and how will it impact the end-users?
7) We need a team to manage Dataops and Data Governance anyways.
I can see a good future forData mesh, especially for large organizations. These observations are from my personal experiences so interested to hear about other people’s thoughts and approaches to overcome this.