Data Mesh: Implications for Data Product Teams and Business Outcomes
Are you evaluating whether Data Mesh is a good fit for your organization? Then it’s worth checking out this brief write-up on how business, not IT, will accelerate data monetization. We’ll dive into a data strategy that unifies IT and business teams around business outcomes.
Central to this discussion will be an emphasis on the Data Mesh framework and the concept that no code/low code visualization tools play a key role in enabling business users with no programming experience to apply their domain expertise to generating business outcomes.
Can you put forth a strategy that places your business team as data stewards in your organization? To do so, your objective becomes how to enable collaboration between your business and IT teams.
IT isn’t the department that will accelerate data monetization at your company; it’s your business team. Those working in a business function likely do not have deep programming expertise, so you’ll have to lower the bar of entry for data analytics and science at your organization.
This is typically done with low/no code tools, like KgBase, and cloud platforms that facilitate data democratization across the organization. Once in place, you will need to then establish how you’ll protect your data, which means establishing a data governance strategy and assigning proper data ownership (I cover this briefly at the end).
Delegate as Much to the Cloud Provider
Data preparation, ETL and integration between multiple systems takes up 70 percent of a project’s cost. Every hour spent on integration of different solutions or infrastructures is time lost on data monetization and business value creation. To quickly cut down on deployment time and costs, leverage cloud native solutions, like:
- AWS: Redshift
- Google: GCP
- Microsoft: Synapse or Databricks
Understanding Two Different Worlds: Applications and Data/AI
In the application world, there’s DevOps, an approach to software development that accelerates the build lifecycle by automating integration, testing, and deployment of code. Software developers building applications understand this quite well.
In the less mature Data/AI world, there’s DataOps, which applies agile development, DevOps and lean manufacturing to data analytics development and operations.
There’s much from DevOps that DataOps should leverage, such as microservices. Microservices in the application world involves going from a monolithic centralized application to distributed services. You’re breaking down a very complex application into a smaller one.
Yeah, but what does all of this gibberish have to do with Data Mesh? Good question!
Data Mesh takes the microservices concept and applies it to the Data/AI world. You’re going from a monolithic centralized data architecture to a distributed decentralized architecture that’s scalable.
Let’s take a quick step back and explore the evolution of data architecture over the last 40 years to see how the Data Mesh paradigm has evolved.
In the late 1980s, everything was in silos, and the data warehouse concept was invented to break them down. In the 2000s, data lakes emerged followed by cloud data platforms where organizations brought their data warehouses and data lakes into the cloud for easier, faster scalability.
In 2021, the topic on everyone’s mind is Data Mesh.
Over the last 40 years, three groups have emerged to manage the analytic and operational sides of the data landscape — the operational team (data producer), analytic team (data consumer), and the ETL or data pipeline team (central team).
- The data producer has business and the domain knowledge, while
- The central team has the data engineering knowledge, and
- The data consumer has the analytical and ML knowledge.
The data producer only cares about producing the data and not how it’s consumed. When the consumer needs new data, they submit a request to the central team. The problem is that the central team is completely overloaded with requests, so new ones go into a backlog. This siloed approach is why centralization (data warehouses and data lakes) failed. Lack of data ownership, data quality, and business/IT teams working in silos led to the disconnects we see today between data producers and the data consumers.
New paradigms that take advantage of data virtualization and data federation are coming to bear. They support a managerial mindset where a company seeks to operationalize its data by improving performance across their data teams. Doing so allows an organization to:
- Remove bottlenecks in the central team,
- Reduce data duplication,
- Reduce ETL data pipeline,
- Improve speed to market and rapid prototyping,
- Reduce stale data, and
- Centralize data security.
The Data Mesh is a paradigm shift going from a centralized monolithic data architecture to a decentralized distributed data architecture. It’s a change in mindset as much as it is a change in technological implementation.
Four Principals to Consider
This paradigm establishes data teams by domain, where data producers, consumers, and the central team form one collaborative team across the entire business and IT department. The founder of Data Mesh, Zhamak Dehghani, takes a principled approach to close the divide between the operational and analytical worlds:
- Principle 1: Domain oriented decentralization of data ownership and architecture.
- Principle 2: Data as a product.
- Principle 3: Create a self-serve data platform to enable autonomous domain oriented data teams.
- Principle 4: Create a federated governance to enable ecosystem and interoperability.
These principles address data management in a number of ways: data monetization, data democratization, data driven culture, distributed data ownership, data protection, data quality, data governance, and treating data as a business asset.
Implementing these principles bring business and IT folk together as a data product team. In a collaborative effort, they work in parallel to create business value and accelerate time to value for their company and customer base.
Principles #3 and #4 pose a bit of a challenge because many enterprises fear decentralization as it can lead to data duplication and siloed data. However, if you examine the logical architecture, it’s more of a hybrid where the data products are decentralized but data governance of the platform is still centralized.
There’s an input. There’s an output. There’s a discovery zone. There’s an audit zone. And then, if you look inside, a data producer sits on the business side while a data product engineer sits in IT. The data consumer can be the data product owner serving as the data steward, while also as part of the business team. But remember, you implement low code/no code solutions because business people won’t be programming.
Data Mesh requires a platform, not any single technology. The platform involves different aspects of data governance, DevOps, automation, data share technology, and a data visualization solution.
A Note on No/Low Code Data Management Tools
Emerging data management tools have native integration with data governance, DataOps, AI/ML. The no code data visualization interface allows anyone to access and manage data across the organization, thereby satisfying requirements of a Data Mesh platform.
Data product teams should leverage real time analytics on operational data, reducing reliance on ETL streaming every time. This, in turn, unifies the operational and analytical worlds and helps the organization create data domains even faster.
From a management perspective, the Data Mesh framework promotes data sharing, data publishing, data discovery, and data interoperability. Unifying the data producer and consumer into a single data product team accelerates the data value chain and data monetization.
In business value creation, the user must treat data as a business asset, which is a data product.
This drives the IT team to prioritize business outcomes rather than obsess over technology stacks. Rather than be a replacement for a data warehouse or lake, Data Mesh gives your organization an alternative.
If your organization has a simple data model, with low data volume and only a few data domains, perhaps you can stick with a data lake or warehouse. If your organization tried these solutions but your data models are too complex with very high data volume across many different data domains, then a Data Mesh may be a good fit for your organization.
Unique Knowledge Graphs Applications and Use Cases
Graph analytics will grow in the next few years due to the need to ask complex questions across complex data, which is…
A Note on Enterprise Knowledge Graphs
In the modern data warehouse approach, you’re pulling data on premise from external data sources. For example, you get the data in your cloud provider onsite from a data lake, and then you clean that data. Then you serve the different use cases or analytics that need this data.
There are different departments within your company who want to become agile very quickly in adopting data and analytics and implementing associated projects and use cases.
How are you going to actually be agile?
So, if we follow the centralized approach in terms of adopting data and analytics, we have this central environment, with different data services, data factories, data works, data lakes, etc.
How do you enable these other lines of business without worrying about the centralized environment? By having a decentralized approach, you can implement unique projects without having to flow through the central license.
So, here is where you need a federated ecosystem governance, one that is a data platform governance and the other that is a data product governance. So, what is the difference?
Data platform governance involves understanding where you’re going to store your data. Put your data where it should be instead of forcing everyone to use the data lake. For example, I am from the business analyst group and I need to look for specific data, analyze data, and discover new data. I should be able to do that by searching the data catalogue. I should understand where that data comes from. So, that’s data product governance.
In Data Mesh, there’s an input and an output. There’s also a part wherever you would like to discover your data. There’s also a part where you can find out what’s happening with your data after an audit.
The complexity to construct knowledge graphs, and how low code tools can help, or hurt.
In last week’s webinar hosted by KgBase, Francois Scharffe overviewed the construction process for knowledge graphs…
Excited about the possibilities of the Data Mesh? Let’s connect on LinkedIn.
Back to basics brought to you by LinkDap.