Characteristics of emerging modern data economy standards

--

Abstract In my exploration of the evolving standards in the data economy, a key research question has been to pinpoint the defining characteristics of these standards. My findings primarily focus on the nascent standards of data contracts, with a brief examination of related standards concerning data products. In the article I describe the initial findings which are to be extended and clarified in the future based on interviews among the developers of the standards.

Here is a brief summary. Emerging data economy standards, particularly in data contracts, prioritize readability and tool orientation, with YAML becoming the preferred format for its ease of use and human-readability over XML and JSON. These standards emphasize interoperability and automation, allowing for direct integration with CI/CD processes and other tools, showcasing a move towards “Everything as Code” to include computational elements in contracts. However, they maintain a compact and focused approach to ensure simplicity and fast adoption, potentially incorporating extensions to accommodate diverse use cases while striving to maintain interoperability and a clear understanding of extensions’ purposes.

Readability and tools-oriented

YAML without too many markdown characters make it easy to read and is the preferred markdown option in data contract standards. It is the base of both Open Data Contract Standard and Data Contract Specification. XML and JSON have been previously used to provide similar capabilities in the configurations and blueprints of processes, architectures and configurations. Also the emerging data product specifications have support for YAML. As an example Open Data Product Specification has been so far described by default in JSON, but next development version is now YAML by default. The other XML seems to be left behind.

JSON stands for JavaScript Object Notation. It is a lightweight format for storing and transferring data. Initially designed for JavaScript, JSON has gained popularity due to its simple structure. However, since then, many fields have adopted the language, making it one of the most popular ways of transferring data across the internet.

YAML ( YAML Ain’t Markup Language ) is a markup language widely used in configuring files for DevOps tools, programs, and applications. It is known for its simplicity compared to XML and JSON. It uses indentation and newlines to separate data rather than symbols and brackets. It provides a standardized format for representing structured data in a way that is both easily understandable to humans and interpretable by machines.

The emphasis on human-readability makes YAML especially well-suited for various applications, including configuration (config) files and data exchange between different systems. Its straightforward and intuitive structure enhances its usability across different domains, enabling users to define and organize data in a clear and understandable manner.

The tools-oriented refers to ability to connect data contract directly with automation processes and related tools. That is visible in ODCS community discussion and one of their guidelines states that: “We favor interoperability over readability. Tool interoperability (import/export) is crucial for the success of the standard”. This is close to fundamentals of CI/CD since it enables easy process and tools interoperability and management in YAML file which is also commonly used in various CI/CD implementations.

Data Contract Specification website states: “Later in development and production, they also serve as the basis for code generation, testing, schema validations, quality checks, monitoring, access control, and computational governance policies”. The intention here is very close what we have seen already in Open API Specification. That specification is also supporting YAML and the API described as OAS file can be used in generating client and server code, test scripts, and content (Schema) validation among other things. In the API world, the OAS file is often also referred as API Contract. That is very much aligned with data contract concept and interests among the data contract standard developers.

Computational data contracts

This Everything as Code is not so strong in data contracts yet. This is related to tools-oriented characteristics discussed above. Everything as Code applied to for example in data quality, not yet visible in ODCS but in another parallel Data Contrast Specification. The code snippets as part of the description of a contract can be directly used in tools, automation and monitoring as well as foundation for actions such as warnings and interventions in process flow.

This trend indicates transition towards dynamic descriptions. The data contract is not a static document with fixed values only, but contains also “living” computational parts. This is not that far from smart contracts concept. A smart contract is defined as a digital agreement that is signed and stored on a blockchain network, which executes automatically when the contract’s terms and conditions (T&C) are met.

The difference between now visible EaC applications in data contracts and Smart Contracts is that latter contains logical functionality (like if this is true, then do this), while data contracts are still limited to monitoring rules.

The application of Smart Contracts features in data contracts might make sense especially in case of high speed transaction data and IoT data. Also pricing part of the data contract seems intriguing area to explore. Data Contract could include “actions” part which could be defined with programming language alike structure for billing. As an example, if pricing plan is data amount transferred based, then at given point of data consumption a bill would be automatically sent. These conditional actions should not be limited to Blockchain only, but could be rules that can be applied as input to billing systems directly.

Adding actions to data contracts would increase the computational nature of the data contracts. Instead of just including threshold values and related monitoring rules, it would include also actions and by so data contracts would become actionable packages.

At the same time interest of the data contract developers is to keep it compact and that has to be taken into account if and when they consider including actions to the standard as well.

Compact and tight focus

The data contract standardization community is not looking for wide coverage of all possible cases. Instead they “favor a small standard over a large one. Keep the standard small, so it remains simple”.

The guideline drives faster development which is fundamental to support constantly progressing technical development and changing business environment. Smaller standard with tight scope can also have lower learning curve and thus drive faster adoption.

Fast adoption is related to Chasm defined by Moore.In his boo “Crossing the Chasm” Moore builds upon Everett Rogers’ diffusion of innovations theory, highlighting a significant gap between early adopters (technology enthusiasts and visionaries) and the early majority (pragmatists). The author, Moore, emphasizes the distinct expectations of these groups and offers strategies to bridge this gap. These strategies encompass identifying a target market, grasping the complete product concept, product positioning, crafting a marketing strategy, selecting an optimal distribution channel, and setting the price.

Early-adopters want the fastest, most innovative products, and are willing to pay a higher premium to receive those benefits. They are willing to take the time and even trial and errors approach to apply the product which in here is the standard. On the other hand, for the mainstream consumer, the product should solve a relevant problem and shouldn’t be too complicated to use.

Given that standard keeps the tight focus and is very compact, it will also enable application of 3–30–3 rule more known from marketing and sales. The rule has many variations, but the basic for is:3 seconds to capture your attention, 30 seconds to keep you engaged. 3 minutes to deliver your complete message. In the data contract standard in marketing the first headlines and value propositions get the attention, the following material to look at offers more details. Finally in 30 minutes you are able to apply the standard for example in the context of hello world. Here the “deliver your complete message” is more like “get the first value” in 30 minutes. This can be done with ready-made guides and examples.

The limitation of tight focus is limited use case applicability as it can be applied only to very limited amount of use cases. To tackle to limitation data contract community is leaning towards extensions.

Extension oriented

While keeping the standard compact and not to include all possible from all the use case requirements is one of the principles discussed above. The Open Data Contract Standard guiding principles state that “Instead of adding every field to the standard, go more in favor of allowing extensions”. The practices to allow extensions in the emerging data contract standards differ.

In Open Data Contract Standard extensions or additions to the standard are allowed only in a separate object named customProperties. The extensions in here are attribute pairs of property and value. Data Contract Specification allows additional data to be added to extend the specification at certain points at object level: Data Contract Object, Info Object, Contact Object, and Server Object. The extensions can be named freely and value can be null, a primitive, an array or an object.

Both of the approaches obviously enable extensions in order to expand the applicability of the standard. Extensions sound nice method to say “yes, you can fit it into your needs with extensions and fill in the gaps”, but that comes with some drawbacks as well.

  1. Lack of Standardization — while the standard offers a robust standard for describing data contract extensions, the result lacks standardized implementation and documentation.
  2. Limited Support — data contract extensions are not uniformly supported across all tools and platforms, leading to interoperability issues. At the same time one of the guiding principle of ODCS is “We favor interoperability over readability”. This is a challenge they must overcome.
  3. Absence of Metadata — neither approach described above does not provide mechanisms to capture detailed information about the extensions themselves, making it challenging to understand their purpose and functionality.
  4. Potential for Misuse — without clear guidelines and validation mechanisms, there’s a risk of misuse or misinterpretation of extensions, undermining the integrity and consistency of data contract specifications

Both described extension methods enable wild extensions that will lead to reduced interoperability as discussed above. Perhaps there should be a standardized extension object and register for validated extensions with metadata and documentation.

--

--

Jarkko Moilanen (PhD)
Exploring the Frontier of Data Products

Open Data Product Specification igniter and maintainer (Linux Foundation project). Author of business-oriented data economy books. AI/ Product Lead professional