A/B comparison of target states for data product standards

--

Data product metadata models and data contracts are vital in providing structure, security, compliance, and efficiency in data governance, making them indispensable tools for companies in the modern data-driven landscape.

My exploration of the evolving standards in the data economy, particularly focusing on data products and associated contracts, is progressing rapidly. I embarked on this path in 2019 with the development of the Open Data Product Specification. Since then, this topic has not only advanced but has also become a central theme in my second PhD research. This research effectively commenced a few years back, marked by my 2022 prototype paper on the Open Data Product Specification, presented at the 17th International Conference on Design Science Research in Information Systems and Technology. Now, the question arises: what broader developments are occurring in the field of data product standardization?

Isolated emerging standards

Currently, I have identified somewhat isolated efforts in data product related standardization: Open Data Contract Standard, Data Contract Descriptor, Data Product Descriptor Specification, and the Open Data Product Specification (more details in a separate post). My goal is to speed up the development in the data economy on this small but significant area of data management. I do have a dream that in near future we have more widely accepted metadata standards and models for data products and I have been able to assist in it. This is my motivation.

Towards more unified standard(s)

I intend to convene a small group of experienced professionals from top organizations developing emerging standards. Together, at a round table discussion with me serving as their facilitator and recorder, we aim to reach a consensus and collaborate towards the creation of more unified standards. For this endeavor to be effective, it’s crucial to establish a clear target and a collectively agreed-upon goal. This post serves as a brainstorming exercise to explore various options for discussion.

To focus efforts and resources on something meaningful and agreed, we need to agree on the target state. A target state is a clearly defined vision or blueprint of a desired future condition or outcome for an organization, project, or system. It represents the end goal or aspiration that an organization seeks to achieve. The concept is often used in strategic planning, enterprise architecture, and project management.

Here, the target state is the form of the accepted situation in an attempt to take data product-related standards forward to increase productivity, interoperability, and business value (drivers of the change). Let us assume that the drivers are accepted among stakeholders.

In this exercise, we have 4 emerging standards:

  • 2 with contract approach and
  • 2 with product approach.

The contract standards are mentally closer to each other and at least one member of the other participates in the steering group of the other. With the product related standards situation is different and I am not even sure if the groups know about each other. More about the 4 standards in the previous post.

The target state options A and B

In this mental exercise, we have two target states as a result of the hunt for the “unified” data products standard: Monolith and Bipolar. In reality we might have other options as well, but lets take is as it is — a mental exercise and step towards more mature options.

Monolith — all in one

In this target state rivalry emerging standards on both contract and product approach are combined into one bigger standard. Overlappings are removed and contractual as well as product attributes are in one model.

In here all 4 emerging standards decide to aim for one bigger standard. They agree to merge the two different approaches into one. Most likely the result is more complex than those would be as separate standards. Yet if the monolith serves the business purposes and is still easy enough to adopt and develop further, it would hold greater interoperability potential.

As previously mentioned, there is currently an overlap between the standards. However, there are still significant differences between them. Combining these standards would necessitate considerable compromises and widespread consensus among a broad group of professionals, which could be a time-consuming process.

Bipolar result — divided but compatible

In this option contract and product approaches stay as independent elements of the standardization. In both “camps” the standard to apply is agreed upon but not combined into a single standard.

In this approach, the Data Contract Descriptor and Open Data Contract Standard are collaborating to establish a unified model and direction. They maintain their focus on contract aspects while combining their efforts.

Similarly, on the data product front, the Data Product Descriptor Specification and Open Data Product Specification are aligning their frameworks. Their goal is to eventually integrate and operate collectively as a single unified standard.

Consequently, this leads to the existence of two distinct standards, each progressing independently. However, the industry’s need is for these standards to function concurrently and harmoniously. This necessitates a focus on standardizing the aspects that facilitate connectivity and interoperability. Essentially, the emphasis of standardization should be less on the individual cores of the standards and more on the elements that enable them to interconnect and interact effectively. The ultimate goal is to optimize their simultaneous usage, creating a synergistic relationship that benefits both standards.

Comparison

Let us first do a high-level easy exercise and codify the “goodness” of options according to agility, complexity, interoperability, and …Options for each box is one of following: low, medium, high. We will discuss the mentioned aspects below the comparison matrix.

Business Agility

The Bipolar approach is likely more adaptable to quickly changing business requirements than a singular, rigid approach. This flexibility stems from the reduced need for extensive agreement between parties involved in contract negotiation and product standardization. However, for maximum effectiveness, both parties should agree on common characteristics or other strategies to ensure smooth integration and use of these standards in real-world applications. In Bipolar approach the benefit is also that if the system does not need the other side of the package (for example product), then there is just a smaller standard to adopt and worry.

Complexity

Complexity in this context presents a dual challenge. On one side, a more comprehensive and extensive standard (monolithic approach) tends to be more complex. Conversely, the bipolar approach involves managing two distinct standards that must function together harmoniously. This means adopting and aligning two separate standards, which can be more demanding in terms of interoperability compared to a monolithic standard. The advantage of the bipolar method lies in its division into two distinct parts (contract focus and product focus), each maintained as smaller, more manageable segments.

Interoperability

Interoperability in the context of software and technology refers to the ability of different systems, applications, or products to exchange and use information effectively. In general, having one standard instead of two is likely to increase the interoperability between tools and systems with less efforts than having two. Having two very compatible standards can offer more flexibility and choice. It allows for diversity in solutions and can foster innovation by not limiting developers or industries to a single approach. In cases where integration is key, a single standard might be preferable for seamless operation. It avoids the potential challenges of ensuring ongoing compatibility between two standards. Multiple standards can be more adaptable to changes and specific needs of different sectors or technologies. They might also provide backup options in case one standard becomes obsolete or less efficient. In conclusion, the choice depends on the specific needs of the industry or application, with a single standard offering simplicity and uniformity, while multiple compatible standards provide flexibility and adaptability.

Next steps

The previous analysis provides only a preliminary overview. To make a well-informed decision on the direction to pursue, a comprehensive analysis of the standards is necessary. This will involve identifying areas of overlap as well as distinct, value-adding elements within each standard. This task is currently a priority. The findings from this in-depth analysis will be further enriched by insights gathered from interviews with developers involved in these emerging standards.

If you wish to cooperate, write together or have insights to offer, please leave a comment, or contact me via Linkedin

--

--

Jarkko Moilanen (PhD)
Exploring the Frontier of Data Products

Open Data Product Specification igniter and maintainer (Linux Foundation project). Author of business-oriented data economy books. AI/ Product Lead professional