Data Economy Interoperability Framework — shared standardized components and extensions (Part 1)

--

Abstract. The article outlines challenges faced by companies in utilizing data standards, highlighting the complexity arising from diverse standards and extension integration. It proposes the Data Economy Interoperability Framework (DEIF) as a solution, aiming to unify standards, provide shared components, and support standardized and localized extensions. Stakeholders include DEIF maintainers, data economy standards developers, and solution vendors, each incentivized to participate by business opportunities and growth. The DEIF model is illustrated through an example, demonstrating how it facilitates interoperability and customization within organizations. This article describes the foundational thinking behind DEIF. The coming second article will describe a practical approach of the standardization, how to document it, extend and maintain the library of standardized extensions, and how to get started.

Description of reality in companies utilizing standards

Let us begin this article with hypothetical descriptions of the data economy. A company wants to share data internally between functions. They apply Data Contracts (which are standardized as we speak).

Oh but then the Chief Growth Officer comes and says our partner heard about our data product and would be able to create value with it and we want to enable that. Is that internal data contract now enough? Well, no it is not since our legal department wants us to have an agreement containing more information than what is in the data contract.

Luckily we can use some of the data contract content, but what about the rest? From where do we get that? Then we luckily discovered that hey we have defined the data products in our catalog according to a data product specification which is also standardized. That standard has a little different content from the data contract. Those two overlap yes, but since there is no other easier standardized approach yet, we have survived with it.

The above-described situations vary a lot, but it should suffice that one standard alone is hardly able to cover the variety of business operation needs let alone ecosystem-level needs. One standard to cover all those cases, would become a huge monolith and lead to stagnation as it would become very slow to change. Thus more likely better option is to have multiple standards.

The above example can be covered with two standards (data contract and data product), from which the legal contract is mostly anyway derived. In any case, we will have at least two standards, which even overlap for some parts.

Now that we have multiple standards, it would be beneficial to have some interoperability between them to avoid constant conversions (errors included) as part of the value chain. Also, the tooling is more complex, and having something shared between the standards would make that cheaper and easier to develop and maintain.

We know that the standards are limited and we must be able to extend those if needed. But then we also discovered that tools created by others never take advantage of the nice extension. Also, some of our tool vendors have great extensions, but fitting them into our practices is always more or less craftsmanship and that is expensive and slow.

The problems in short

Diverse Standards

We face the challenge of utilizing multiple standards to address company-wide data product requirements, exchange, and value creation. However, these standards are often incompatible, leading to redundant definitions and cumbersome maintenance during conversions.

Extension Complexity

Although standards offer extension capabilities to accommodate unique cases not directly covered, integrating these extensions with our tools and processes incurs significant costs. Existing extensions created by vendors could potentially resolve this issue, but the lack of visibility into previous extension developments hampers our ability to leverage them effectively.

Optimizing Standards

Ideally, we aim to maintain concise standards while embracing extensions to tailor them to specific needs, thereby expanding their applicability. However, the absence of information regarding existing extensions complicates this process and undermines our efforts to streamline standards usage.

Requirements for the future solution

Based on the above we can draw some requirements for the future solution. We need to agree on some elements used in different standards to support core business process use cases without conversions. We still want to keep the standards small, compact, and easy to adopt. We want to have local extensions which are used only by our solutions. We want to discover and utilize stable and managed existing vendor extensions to cut costs, development, and delivery time delays. As vendors, we do have extensions, but would need a good place where to document those so that it would drive more customers to us (act as a sales channel). In exchange we we would maintain the documentation for the extension. As an example Everything has Code support for data quality and SLA.

Based on the above I have sketched a laughing version of the possible North Star towards which we could drive the data economy standards to have better interoperability. Let’s call that Data Economy Interoperability Framework. This will provide a solution to the above-listed issues and match the requirements.

Solution — Data Economy Interoperability Framework (DEIF)

DEIF enables Data Economy standards to utilize shared and standardized components instead of defining it multiple times inside various standards. DEIF supports the concepts of standardized extensions as well as local extensions. DEIF encourages solution vendors to contribute extensions to the core components. The vendor extensions will be discovered from one stop shop — DEIF documentation.

DEIF contains the following components:

  1. Core Data Economy standards
  2. Shared Standardized Components
  3. Standardized Extensions
  4. Localized Extensions

Core Data Economy standards refers to emerging new light-weight, computation-oriented, and compact standards like Open Data Contract Standard, Data Contract Specification, Open Data Product Specification, and Data Mesh oriented Data Product Descriptor Specification. These standards overlap with each other and contain same components, but with small differences. DEIF aims to unify agreed components across the standards.

The Shared Standardized Components are reused in various Data Economy standards and thus drive interoperability. This is the core of all interoperability. Committed Standards from the field of Data Economy use these components in their standard. The Data Economy standards of course will have other components as well since DEIF Core components focus to support limited set of fundamental business processes related to data product and data exchange.

The Standardized Extensions provide additional features to enable wider use case support. This helps standards to stay focused and compact. The concept also includes support for the organization’s internal localized extensions.

The difference between localized and standardized extensions is that only the latter are added to the DEIF one-stop shop documentation. Local extensions are local and utilized in the organization’s internal solutions only.

Attempts to build libraries of extensions has been implemented in the API economy, in which OpenAPI Specification is the default. OAS offers simple way to add extensions to API description (which some consider as API Contract similar to Data Contract). Yet, even API Economy lacks clear one-stop-shop to find available extensions even though some of them are widely used. Yet, some attempts exists.

Openapi-specification-extensions is a “resource for common and standardised OpenAPI specification (vendor) extensions”. It is based on analysis of APIs found from various sources. Listing the extensions is hardly enough to enable scaling and machine-readability and need to standardize extensions is evident.

An attempt to standardize API extensions has been done in Semoasa — a machine-readable format for extensions to Swagger/OpenAPI 2.0 and 3.0 specifications. Semoasa declares the problems in API economy context: “While specification extensions are widely used, there is no standardized way to define their syntax, expected values, and allowed usage context. This makes it much more difficult for OpenAPI tools to support specification extensions with content assist, validation, and other features that those tools typically provide for standard OpenAPI language constructs.” The work done in Semoasa should be reviewed and take the lessons-learned in the development of standardized extensions in DEIF.

Stakeholders and motivation to participate

In this framework, we have multiple stakeholders: DEIF maintainers and host, Data Economy Standards (such as Open Data Contract Standard), and solution vendors (companies building platforms and tools to be used)

Solution vendors are now more or less taking the standards and applying those to the tools and platforms they develop. In this model, they are given a more proactive role. They provide parts of the DEIF standard for the standardized extensions that are compatible with the DEIF core components. Examples of such vendors could be SodaCL and Montecarlo which provide monitoring platforms for data quality. Not all want to utilize their tools in solutions and allowing vendor-specific solutions to become part of the core standard is not a desired result. Thus the concept of standardized extensions.

Vendors are accountable for their part of the DEIF. When they submit a candidate extension to be included in DEIF, they also commit to maintain that part in the future. Their incentive to participate is business. Given that their extension is widely discoverable as part of the DEIF among all the data economy entities, the more their solution will be used (more sales).

Maintainers of Core Standards such as Open Data Contract Standard and Open Data Product Specificatio develop and maintain their specific standards and try to keep those compact. They also want the data ecosystem to blossom and grow. That is why they are willing (hopefully) to support and commit to using some shared components as part of their standards. That drives interoperability and also enables efficiency, business growth, and increased value creation from data.

DEIF maintainers and host. Here I would love to see a host like the Linux Foundation emerge and provide support. LF is credible and has a history of supporting successful standardization to emerge. The most prominent example is the Open API Specification.

Let’s take an example

To elaborate on the above-described DEIF model with an example. In the example, we have an organization utilizing both Data Contract and Data Product Specification.

In the above example, we utilize 2 standardized core components of the DEIF: Data Quality and Access. Those could as well be any of the core components. Both data product descriptions in the catalog and data contract defines the two aspects of the artifact with the same structure (Schema). Now at least two parts of the metadata needed in the business processes are defined the same and there is no need for conversions.

Both use Everything as Code extension (from standardized extensions catalog) and apply SodaCL rules for data quality. Likewise, Access to the data is defined with one model.

Not all the objects of the data contract and data product description have to be shared, but will include also components from the mentioned standards (white bg boxes). Over time as standardization matures, some of these might become also shared models

On top that, organizations can have local extensions (grey objects). This enables organizations to finetune the fitting of the standards to own business processes and tools not supported by standardized extensions.

Summary

Data Economy Interoperability Framework (DEIF) is a solution, aiming to unify standards, provide shared components, and support standardized and localized extensions. Stakeholders include DEIF maintainers, data economy standards developers, and solution vendors, each incentivized to participate by business opportunities and growth. The DEIF model was illustrated above through an example, demonstrating how it facilitates interoperability and customization within organizations. This article described the foundational thinking behind DEIF.

As discussed in the beginning a second article is coming and it will extend the description of DEIF with more practical level. In the second article we will drill down to implementation and first steps to get started with building interoperability between the emerging data economy standards.

In order to DEIF get enough traction among practitioners and other stakeholders, a credible host like Linux Foundation is needed for the project. Why Linux Foundation? They already host Open Data Contract Standard (Bitol) project and there’s also discussions to host Open Data Product Specification too. Linux Foundation would be ideal host in my opinion.

--

--

Jarkko Moilanen (PhD)
Exploring the Frontier of Data Products

Open Data Product Specification igniter and maintainer (Linux Foundation project). Author of business-oriented data economy books. AI/ Product Lead professional