Managing the evolving landscape of data products: Part 1

Ayush Sharma
5 min readJan 17, 2024

This is the first of a two-part series exploring the scope of data products in a dynamic landscape. This article delves into managing the evolving scope of individual data products in a Data Mesh architecture.

Photo by Taylor Vick on Unsplash

The building blocks of the Data Mesh architecture are data products, discrete items that make data usable to others. While a good Data Mesh will contain many data products, as the number proliferates, this can introduce complexity and governance challenges. Therefore, it’s essential to have a robust framework to manage dependencies, coordinate updates and ensure data consistency across the ecosystem. This involves defining clear ownership and accountability for each data product and assigning dedicated teams or individuals to handle development, maintenance, and evolution.

I. Challenges of multiple data products (DPs) & the role of data governance

  • Challenges are caused by having multiple data products for the same data source
  • Impact of multiple DPs reading from the same operational system
  • Duplication due to different data products addressing the same business needs
  • The need for strong governance and ownership structures

Organizations attempting to shift into a federated data environment via Data Mesh require strong data governance to avoid the accidental explosion of data products that can lead to increased complexity, duplication of effort, and ownership issues.

When multiple data products are created for the same data source, it puts a strain on the source systems. This also has the potential to result in performance issues for transaction applications. While it’s common to have different data products reading various categories of data from the source systems, it is crucial to carefully design the relationships between data products and source data, considering the specific needs and requirements of each data product.

Image: Depicting reusability in data products, source: Data Mesh in practice: Product thinking and development (Part III)

Leveraging source oriented data products (SoDP) and promoting data product reusability serves as a strategic means to alleviate undue stress on source systems, yielding a streamlined and efficient operational landscape. This approach helps to avoid unnecessary replication of data in the Data Mesh or data catalogue, leading to a more streamlined and efficient data ecosystem. It enables consumers to access the required information from the appropriate data product without encountering redundant or conflicting data.

This helps:

  • Improve data quality
  • Simplify data discovery
  • Ensure consistency across the organization’s data landscape

By addressing complexities and governance issues, organizations can streamline their data product ecosystem, ensuring efficient collaboration, adherence to standards, and effective management of data assets. This helps teams to work cohesively, and fosters trust in the data products and the Data Mesh architecture.

To strike a balance between granularity and simplicity, creating data products that are modular, agile; and aligned with the overall objectives of the organisation is crucial.

Duplication can also develop because of different data products solving the same data use cases. This is particularly visible in consumer-oriented data products (CODPs) generated through the collaboration of multiple data products. Another reason could be that different teams require the same data use cases to be addressed, or that one team is attempting to build a data product for a specific use case.

Managing duplicates using a feedback-based ranking system

One way to manage duplications in the Data Mesh is by implementing a feedback-based ranking system. This system considers the maintenance of Service Level Objectives (SLOs) and Service Level Indicators (SLIs) of data products. A well-maintained data product that meets the required SLOs and SLIs will receive higher quality points, resulting in a higher ranking during the discovery phase. The data product will receive more visibility with a higher rank, leading to increased search and use. Hence, it promotes continuous improvement.

This approach is referred to as a “feedback-based ranking system” or a “quality-based ranking system.” It aligns with the principles of feedback-driven development, quality management, and data product lifecycle management. The implementation of such a system involves the following:

  • Data product discovery mechanisms
  • Feedback collection and analysis
  • Ranking algorithms
  • Retirement and archival processes.

As part of this ranking system, a fitness function can be used to evaluate the performance and quality of data products. A fitness function is a mathematical function that assesses the data product based on predefined criteria, such as data quality, reliability, timeliness, availability, user satisfaction, and adherence to service level agreements (SLAs). By incorporating relevant metrics and feedback from users, such as usage patterns, user ratings, response times, and data accuracy, the fitness function assigns scores or rankings to data products.

By leveraging a fitness function, the system can objectively evaluate and prioritize data products based on their performance against the defined criteria.

II. Versioning data products

Data product versioning is the systematic monitoring and control of modifications to the data product as it evolves. Like software versioning, it involves the allocation of a distinct identifier or label to diverse iterations or releases of the data product. This identifier plays a pivotal role in distinguishing and singling out the data product versions, enabling efficient administration of updates and changes.

Image: Depicting verion change on DP

Versioning a data product:

  • Provides a way to keep track of its evolution by documenting the changes made at each stage. This record is valuable for auditing purposes, troubleshooting, and understanding the lineage of the data.
  • Enables controlled and coordinated updates to the data product. When changes are made to the data schema, data processing logic, or underlying infrastructure, having different versions of the data product allows for a phased rollout or deployment strategy.
  • Facilitates compatibility management between data products and their dependencies. It is possible to ensure that the suitable versions are used together, avoiding compatibility issues and data inconsistencies, by clearly specifying the version dependencies between data products.
  • Enables effective collaboration and communication among teams working on the data product. It allows for discussions and decision-making based on specific versions and working with a shared understanding of the data product’s state.

Helps maintain data integrity and ensures a smooth evolution of the data product within the Data Mesh architecture.

Conclusion

Navigating the complexities of evolving individual data products within a Data Mesh architecture demands a strategic approach. Robust governance, ownership structures, and source oriented data products (SoDP) are essential to ensure streamlined operations and data consistency.

Key takeaways:

  • Leveraging a feedback-based ranking system enhances data product quality, aligning with principles of continuous improvement and lifecycle management.
  • Effective data product versioning empowers controlled updates, compatibility, collaboration, and data integrity.
  • SoDPs alleviate stress on source systems, promote reusability, and facilitate collaboration while ensuring data consistency and quality.

--

--

Ayush Sharma

Software Engineer, Data mesh Practitioner @thoughtworks