Emerging “Everything as code” in the data contract standards

--

This article is first of 3 articles of my research journey focusing on data contracts in the data economy.

Abstract and insights

This study provides a comparative analysis of two emerging data contract standards: Data Contract Specification and Open Data Contract Standard. The comparison evaluates technical aspects and examines six key dimensions: consistency, extensions, SLA, data quality, pricing, and data content. This comparative study sheds light on the complexities and considerations involved in the adoption and evolution of data contract standards. Several observations emerge from the analysis.

The philosophy of Everything as Code (EaC), emphasizing the use of code to manage infrastructure and processes, is evident in both standards. However, the integration of “as code” principles with legacy attribute definitions presents challenges, notably observed in the data quality object of Data Contract Specification.EaC could be one of the fundamental philolosophies of data contracts in the future covering more if not all aspects of data contracts in the future leading us to “Data contracts as code”. Intriguing wider application of EaC is explored in more detail at the end of this article with focus on SLA and Pricing.

Data Contract Specification emphasizes object grouping, while Open Data Contract Specification favors arrays and attributes. Both standards are in early development stages, with maturity levels still evolving since their emergence in 2023. Notably, contracts based on voluntary elements demonstrate enhanced usability compared to those generated solely from mandatory elements.

Open Data Contract Specification features more business-related elements, yet its depth remains shallow, particularly concerning pricing structures. Both standards prioritize data access and content but face challenges in standard conversion due to differences in included content, such as SLA specifications.

While both standards offer extensions, their lack of precision introduces potential errors and interpretation ambiguities.

Background and structure

As I mentioned in previous article which was introduction to emerging standards around data products in the data economy, now I am starting analysis on the standards but limit it now to contract oriented standards. Later I will do the same for the data product oriented standards (Data Product Descriptor Specification and Open Data Product Specification).

In this study phase, two selected standards are Data Contract Specification and Open Data Contract Standard. Both are emerging standards. Open Data Contract Standard is developed under Bitol which an initiative (project according to website) under Linux Foundation. Data Contract Specification development is supported by technology consulting firm INNOQ. The standards are not developed in total isolation from each other since at least one person from Data Contract Specification group is also in the other group.

This contract standards oriented research project contains 3 phases.

  1. In the first article (this one) I will do technical analysis of the standards. Focus is on the technical structure of the standards with comparison on approaches regarding consistency, extensions, SLA, data quality, pricing, and data content.
  2. In the future second phase focus will be on the business problems the standards are aiming to solve and analysis will be done to discover similarities and differences. From the findings in the phase I will write part 2 article.
  3. In the last phase I will focus on discussion and and wrapping all findings together. All the findings will be collected to part 3 article.

Structure of the document is as follows. First we introduce the research objects Data Contract Specification and Open Data Contract Standard. Then we briefly discuss what is the contract in this context. Likewise some characteristics of a good standard is discussed briefly. After that we take an overview to the selected standards by visualizing them with two different tools in order to gain high level understanding of the differences and similarities. In this step we use artefacts generated from the Schemas of the selected standards. In here the analysis is still at document level and details of the contract structures are shallow. In order to gain more in-depth understanding of the differences the second phase compares the structure characteristics of the standards in Schema level (complete model to the last leaf). After this the selected standards are compared from following aspects: consistency, extensions, SLA, data quality, pricing, and data content. In the end I will discuss the significance of the findings and why all this matters and contributes to the standardization of data contracts.

Contract in this context

Topic to clarify before taking a deep dive into the comparison is to define what is contract is this context. One should not confuse the data contract and legal contract. Currently data related agreement models are already in use and are needed for multiple reasons. Data sharing agreements are formal contracts executed by organizations exchanging data. These agreements delineate the shared data, its intended purposes, permissible usage, timing, location, duration of sharing, and the respective roles and responsibilities of the involved parties. Such agreements prove especially valuable when sensitive data is involved, ensuring clarity and accountability among parties. By establishing detailed, legally binding provisions, these contracts facilitate a clear understanding of each party’s obligations, minimizing ambiguity and potential disputes.

Data Provisioning Agreement (DPA) is probably most well known. A Data Processing Agreement (DPA) is a legal contract that outlines the terms and conditions governing the processing of personal data by a data processor on behalf of a data controller. If you take a look inside the DPA you will discover a lot of legal conditions and references. Furthermore DPA is not, at least always, machine-readable so that it would provide processing rules and othe information for automation in implementation. DPA is commonly in the form of traditional legal agreement with some technical details included.

In the case of both mentioned data contract standards, the concept of contract is focused on stating clearly the conditions under which the data is accessed and what is included in the data content. Data Product Specification defines data contract as “a document that defines the structure, format, semantics, quality, and terms of use for exchanging data between a data provider and their consumers”.

The above definition is still vague as data quality is part of legal contracts as well. Also terms of use is easily something you would expect to see in legal agreements. Let us explore the data quality aspect first to differentiate the concept in legal agreements and in data contracts. Data quality in legal agreements is often defined as target states, level of quality to adhere to as data product provider. This is not the case for example with Data Contract Specification, which contains structure to define rules for data quality monitoring and measurement. Data Contract Specification is more like configuration for data quality checking rather than setting threshold values for data quality.

The difference from customer point of view is evident. In the legal contract case customer is given a commitment to match defined quality level, but customer has no means to verify that easily. If the Data Contract is created and rules for data quality are defined clearly enough and in a way checks can be implemented with reasonable effort, then the customer and provider can verify the quality which in turn generates more trust to the data and data provider. The practice also offers data consumer more control mechanism to assure of the quality in mutually agreed method and rules. In short, the agreement is not about the end result, but about the means to verify data quality level what ever that is. This Data quality aspect is discussed in more detail later in this document.

Perhaps a better understanding of the data contract nature can be discovered by exploring how it is created and what is the purpose of it. In traditional offering of products and services, the agreement is often defined for most parts in advance and in some cases given by the provider with options accept it or not. In Data Contract approach it is a tool used before the data product is implemented and constructed in cooperation in workshops. The Data Contract is a tool to include design oriented method to the process. Data Contract Specification states that “data contracts are a communication tool to express a common understanding of how data should be structured and interpreted”. As a result from the workshops a data contract is created and defined according to the specification. The resulting contract serve as the basis “for code generation, testing, schema validations, quality checks, monitoring, access control, and computational governance policies.” The resulting agreement is machine-readable instead of traditional agreement text with legal nuances normally wittnessed in agreements.

Yet the above described different purpose of the contract in relation to traditional agreement is not applied in both emerging contract standards consistently. As an example Open Data Contract Standard contains Service-Level Agreement object which contains target state for the availability, not the methods to verify SLA. This approach is closer to what we have seen in traditional agreements regarding SLA. It is also worth mentioning that SLA is not part of the Data Contract Specification. Based on this it can be stated that some level of mixed practices now exists among the selected standards.

What are the characteristics of a good standard?

A good standard is characterized by several attributes that ensure its effectiveness and reliability. Firstly, a good standard should be clear and unambiguous, ensuring that its requirements are easily understandable by all stakeholders. Secondly, it should be specific and precise in its requirements, leaving no room for interpretation or ambiguity. Thirdly, the standard should be relevant to its intended purpose and context, addressing specific needs and objectives.Forthly, it should be applicable across different situations and environments, providing guidance that can be effectively implemented. Lastly, a good standard should be based on sound principles and evidence, ensuring its validity and reliability.

In this document we will touch the aspects of clarity and precision. The relevance is more related to business use cases which is the focus of the following steps after the technical analysis. Also the applicability is use cases related and mostly out of scope of this document.

The aim of this study is not put the two standards in order and claim that one is better than the other. The aim is to discover the differences and similarities which contributes to the development of the standards and help us to take next steps in the contract standarization.

Should we aim to have one unified standard regarding data contracts? In an ideal world yes and that most likely would maximise the interoperability, reduce errors caused by conversions, reduce the implementation costs by enabling code reuse, and much more. I did discuss this topic of future goals in previous post. However that is not so simple and standardization is a very complex process often requiring consensus building which might not succeed partly because standardization process is considered too slow or cumbersome to match the needs of the implementation. Besides, it is often easier to start something from scratch, but later notice that work needed to make artefact stable and accepted by others takes too much efforts and as a result it fades away. And then another initiative emerges. That is why xkcd meme regarding standardization has probably emerged. Below is the meme adopted to the data contract standards scene. This is what I want to avoid, but that is the risk.

Visual overview comparison

Visualizing a schema enhances understanding by providing a clear, graphical representation of the structure and organization. Visualizing schema enables users to identify relationships between different entities, such as objects and their corresponding attributes, helping in understanding how everything is connected. Visual representations serve as effective communication tools, enabling stakeholders to discuss and collaborate on schema designs, requirements, and modifications more efficiently. Thus standards often offer a visual presentation of it as well. Since I was not able to find visualizations of the standards in their documentation, I decided to create those with available tools.

Out of curiosity I decided to do comparison from two perspectives: examples generated from the Schemas (“in use”) with mandatory elements and the full Schemas. In the examples comparison I did not use the examples provided in documentations since it looked like one of them was full example using all possibilities and the other one was not. Thus I choose to generate the examples (just mandatory parts) with a tool and then compare the results. In the full Schema comparison I used the given Schemas as is.

Visualizations of the generated examples

In order to see how the standards behave in use, I generated JSON examples from them. Then I used the JSON viewer tool to get tree views of the JSON files.

Below is the tree view results from generate examples. At this stage I was interested to see just the document level and thus trees are not opened to the last leafs.

Data Contract Specification

Open Data Contract Standard

From the above JSON examples generated from the Schemas of both standards containing just mandatory structures we can say that the amount of required information is very low. The result with just mandatory structures might be useful in some cases, but without going into debates I would argue that both standards might want to revisit was is mandatory and what is not in their standards. After all, the mandatory structures is the “bare minimum” that has to be always defined and provided. Without the mandatory structures intended use cases can not be fulfilled. In order to say are the standards now matching the required use cases is impossible to say since use cases information is not available.

Schema level structure comparison

The above comparison was done only on the mandatory structures level and based on examples generated from the Schemas. Next, I analyzed the standards at all levels to see the breadth and level of details on a more holistic level. In this I used the Schemas provided in order to include all possibilities that might not be used in the generated examples.

I generated tree views with a few tools for the Schemas as well, but the resulting maps were huge in size and details, thus not adding those here. What I did notice in the trees was relatively numerous attributes on the document level for Open Data Contract Standard. This observation inspired me to take a closer look at that part. To do that I took a step back and returned to YAML representation of the standard. Although Open Data Product Standard might look a bit less structured as the document level contains a lot of attributes (instead of Objects), they have used single-line YAML comments as grouping elements. Comments based grouping is obviously not visible in the JSON Schema representation.

After the above exploration with comments I jumped into statistical characteristics comparison first. For comparison I selected following structures: string, array, boolean, and object. No Number data types were found at all. Method used was string search in the Schemas. For example finding Objects is searched for “type”: “object” and checked the result set for possible not intended hits like phrase used in comments or descriptions.

This comparison approach gives us somewhat different results but also confirms that Data Contract Specification is more Object datatype oriented. In total number of data types in the standard, difference is not significant (Data Contract Specification 95 data types, Open Data Contract Standard 110 data types). Data Contract Specification contains 24% Objects, which is more than double the amount of Objects found in Open Data Contract Standard (11%). The difference of string datatypes found is not big: Open Data Contract Standard 73% versus 66% in the Data Contract Descriptor. Also the amount of Boolean datatypes is almost the same. Open Data Contract Standard uses more array datatypes (12%) compared to Data Contract Specification (5%). After getting thorough enough overview of the standards it is time to look at the selected aspects in comparison.

Comparison regarding selected 6 aspects

The selected aspects to use in comparison are: consistency (naming policy only), enxtensions, SLA, data quality, pricing, and data content.

Consistency

The concept of consistency in heuristics originated from studies in cognitive psychology and human-computer interaction, aiming to optimize user experience and system performance. Over time, it has become a fundamental principle in design and decision-making processes across various domains. Heuristics are mental shortcuts or rules of thumb used to make quick decisions or solve problems efficiently. In user experience design, heuristics are guidelines or principles applied to evaluate the usability of interfaces. Consistency is one of Jakob Nielsen’s ten usability heuristics, emphasizing the importance of uniformity and predictability in design elements and interactions. Consistency helps users navigate systems more effectively by making patterns and behaviors recognizable and coherent. Although the background and often used context of Nielsen’s work is UX, it can be applied to standards as well. Doing a thorough analysis of the standards against Nielsen’s 10 principles could provide us more insights, but would require extensive amount of work and worthy of separate project.

A good standard is consistent and thus I examined the naming patterns used in both standards. Both of them are consistent in naming the data types included in the standard. Both use only camelCase for multiword datatype names. Small difference was discovered between the standards regarding naming policy. Data Contract Specification pattern was to start all datatype names with small letter, while in Open Data Contract Standard object names started with capital letter (all except one). The odd object starting with small letter was at the document level (“description”) and rest of the objects were in the definitions part of the schema and all started with capital letter. As explained above consistency is a wider concept than just naming patterns, but at this point I did not have time to go deeper, but perhaps I will do that in it in the future if needed.

Extensions practices

Extension methods in data economy standards play a crucial role in promoting adaptability, interoperability, innovation, and future-proofing within the data ecosystem. Extension methods allow for flexibility in adapting the standard to specific use cases or evolving requirements without altering the core structure of the standard. Organizations may have unique data requirements that are not fully addressed by existing standards. Extension methods enable them to tailor the standard to their specific needs while still adhering to the overarching framework.

While adhering to the core standard ensures interoperability across different systems and platforms, extension methods can facilitate interoperability enhancements by providing additional functionalities or metadata that enhance data exchange and compatibility.

Extension methods encourage innovation by allowing organizations to experiment with new features or enhancements within the framework of established standards, fostering continuous improvement and adaptation to changing technological landscapes. Eventually the innovations might become part of the standard. This is visible for example in AsyncAPI specification in which they encourage extensions to be reported to the standard maintaining organization.

Extensions can be seen as “leads” regarding what “sells” and what are the desires of the industry. Analysing the extensions can be a good tool to define possible additions to the standard. As technologies and data requirements evolve, extension methods offer a way to future-proof standards by accommodating emerging trends and advancements without necessitating a complete overhaul of the standard.

A good widely adopted practice for enabling extensions can be found from the Open API Specification (OAS) and also in above mentioned AsyncAPI specification. In OAS the extension value can be a primitive, an array, an object or null. If the value is an object or array of objects, the object’s property names do not need to start with “x-”.

There are multiple reasons why extensions should be clearly separated from the structure defined by the standard. A clear separation ensures that extensions do not conflict with the standard’s existing attributes and objects, making it easier to maintain and evolve the standard over time. Clear delineation helps prevent ambiguity and ensures interoperability among systems that implement the standard and its extensions. It allows different implementations to understand and process extensions consistently.

By using a distinct prefix like “x-”, extensions are immediately recognizable as non-standard additions. This practice encourages standardization efforts and facilitates future integration with the standard should the extensions become widely adopted. Prefixing with “x-” signals to developers and users that these features are experimental or proprietary. It sets clear expectations about their stability and compatibility, encouraging community collaboration and feedback. The prefix serves as a visual cue in the documentation, making it evident that the attribute or object is an extension. This transparency aids developers in understanding which elements are part of the standard and which are extensions.

Extensions in Open Data Contract Standard

Notable difference is the extensions pattern selected. In Open Data Contract Standard extensions or additions to the standard are allowed only in a separate object named customProperties. The extensions in here are attribute pairs of property and value.

On top of offering the attribute pair method to define additional properties, the standard includes two document level attributes (systemInstance and contractCreatedTs) in the section. The prefix driven pattern discussed above is not adopted in Open Data Contract Standard and reason to offer separate object for customer properties is unknown but expected to be found in later phases of this research.

Extensions in Data Contract Specification

Data Contract Specification allows additional data to be added to extend the specification at certain points at object level: Data Contract Object, Info Object, Contact Object, and Server Object. The extensions can be named freely and value can be null, a primitive, an array or an object.

The Data Contract Specification has not adopted prefixing pattern discussed above for the extensions or at least that is not clearly defined in the standard. This is the case despite the fact that Data Contract Specification declares to “Follow OpenAPI and AsyncAPI conventions so that it feels immediately familiar” in the Design Principles.

SLA

As it was mentioned in the beginning, SLA is not part of the Data Contract Specification but is defined as part of the Open Data Contract Standard. The SLA object is a properties array and a few separate attributes. There is no limit on the type of properties, which leaves a lot of room for wild implementations. Each property can have additional optional attributes: Extended value, Unit, Column, and Driver (describes the importance of the SLA from the list of: regulatory, analytics, or operational).

Data Quality

The Data Quality was discussed briefly in the beginning as an example how traditional agreement (legal) and data contract differ in purpose even though both (can) address the data quality aspect. Instead of defining threshold values for data quality, alternative approach is offered as programmable method to measure data quality as part of the contract definition. Tools for which the built in support is offered include Elevate, Monte Carlo and Soda Checks Language (SodaCL) out of which the two latter are pure “as code” solutions.

Monte Carlo offers a robust and forward-thinking method to enhance data quality, fostering dependable data for vital decision-making within organizations. Its data quality measurement system employs sophisticated algorithms and observability platforms to evaluate and uphold data accuracy, reliability, and consistency across diverse systems. Emphasizing pertinent data quality metrics, Monte Carlo ensures effective monitoring and management practices. Additionally, its approach features rules-based monitoring, enabling users to establish criteria ensuring that data quality metrics derived from specific columns adhere to predefined standards.

SodaCL is an essential tool designed to work alongside Soda tools, enabling users to craft data quality checks and perform scans on datasets. With SodaCL, users can define and implement checks tailored to evaluate data quality effectively. Once configured, SodaCL conducts scans on specified data sources to execute these checks comprehensively. By leveraging SodaCL, organizations can uphold data quality standards and ensure data completeness by identifying any notable deviations from preset criteria. Moreover, SodaCL empowers users to assume control over data quality assessments and facilitates the automation of quality checks, guaranteeing data freshness, completeness, and accuracy throughout the data lifecycle.

SodaCL and MonteCarlo are widely used models. MonteCarlo is not open source. It’s a proprietary data observability platform known for handling common checks like monitoring data freshness and verifying schema conformity. While there are open-source projects related to Monte Carlo simulations, the MonteCarlo data observability platform itself is not open source. MonteCarlo is known for its global and local Monte Carlo simulation options. MonteCarlo is considered more expensive due to its fully managed services compared to SodaCL. SodaCL is open source. It is a YAML-based, domain-specific language for data reliability, used alongside the Soda Core framework. Soda Core, which includes SodaCL, is a free and open-source Python library and CLI tool. SodaCL is praised for its robust features and integration capabilities.

The standards differ significantly regarding how data quality is described. Data Quality in Data Product Specification is defined “as code”. The standard offers three options SodaCL, montecarlo, and custom. Data Quality in Open Data Product Standard is provided as an example implementation for Elevate data quality tool, which is implemented as objects and attributes instead of “as code” like above discussed SodaCL and MonteCarlo.

Pricing

Data products are already sold in data marketplaces. Data product monetization is still relatively small phenomenon compared to dat exchange which often is conducted without direct compensation as payments. Data monetization is witnessing significant growth and innovation in marketplaces, driven by various factors. Some market reports highlight the increasing importance of data monetization, projecting substantial growth in the coming years. Tech leaders are exploring data monetization by collaborating with ecosystem partners, enabling the sale of data and insights. Data marketplaces play a pivotal role in data monetization, providing access to a network of data buyers and simplifying data transactions. Surveys sponsored by industry players emphasize the commitment to advancing data monetization practices. The global data monetization market is expanding rapidly, with revenue projections showing substantial increases. The data monetization market was estimated to be $2.9 billion 2022 and grow to $3.7 billion by 2027.

The monetization of data is a fact and pricing related details of the data product should be defined as part of the process between provider and buyer. Another question is still is that part of the contract we discuss here or perhaps some other standard like data product focused standards? Reason to raise the pricing in here is that it is part of the Open Data Contract Standard, but not in the Data Contract Specification (is part of Terms object). This is yet another clear difference between the mentioned emerging data contract standards.

Pricing in Open Data Contract Standard

Open Data Contract Standard defines experimental price object in their 2.1.1 version. The Object consist of 3 attributes.

The object is rather light-weight compared to for example with 11 standardized pricing plans defined in Open Data Product Specification. Should the pricing be part of contract specification or data product specification, or could we have combined standard (contract and data product aspects in one) in the future is one of the research questions in my PhD research. In depth analysis of this is not part of this phase, but will be done later in the research. However the topic has been discussed in previous post in which I paint two alternative objectives for the future of standardization of data contracts and data products.

Pricing in Data Contract Specification

In Data Contract Specification the pricing related information is inside Terms object with billing attribute, which according to the standard “describes the pricing model for using the data, such as whether it’s free, having a monthly fee, or metered pay-per-use..” In the example given in the standard this field is expressed as string consisting of price amount, currency used and period: “5000 USD per month”.

Data content

As last of the items in comparison I took a look at how to data the content is defined in both standards. That is after all the beef. The approach between the standards regarding the data content Schema is very different. Data Contract Specification reuses existing standards and practices from different systems as objects and “as code” which are included pretty much as is while Open Data Contract Standard has created internal refined structure for the the description.

Data Contract Specification schema approach

First I took a look at schema usage since schemas are the common method to describe the data. Data Conract Specification has separate Schema object, wich has 2 attributes: type and specification. In the below example lines starting from “version: 2” is already part of the Schema.

The first one type indicates the type of the Schema. Typical values for this in the standard are: dbt, bigquery, json-schema, sql-ddl, avro, protobuf, and custom. The second attribute specification is for the actual schema content which can be given as a string or as inline YAML. For the specification attribute standard contains separate objects defined matching the options in type attribute.

Open Data Contract Standard schema approach

In case of the Open Data Contract Standard, the Schema is defined inside Dataset object which contains also some additional information attributes such as tags and description. In total the dataset object contains 29 structure elements out of which majority are intended to be used in data schema description. In the below visualization of the Schema we can see dataset array object in the middle which then is constructed from 4 strings and 3 arrays. One of the arrays is columns under which schema of the content is defined. That columns array contains 22 data structures.

Discussion

In the last part of the analysis I will provide some insights discovered during the analysis and comparison of the mentioned standards.

Data Contract as code — towards smart contracts?

Contract concept in the case of discussed standards resembles for some parts (data quality as good example) automation oriented tools found in software development. The concept behind the thinking most likely is “monitors as code”, which refers to a methodology where monitors, which track the health and performance of systems or processes, are defined and managed using code-based solutions, often integrated into Continuous Integration/Continuous Deployment (CI/CD) workflows. “Monitors as Code” connection to Everything as Code is evident. This approach allows for the programmable creation and maintenance of monitors, enhancing automation and scalability while ensuring consistency across deployments. It involves writing monitor configurations in YAML files within project directories, facilitating versioning, tracking, and approval by team members.

Monitors as code extends beyond mere automation of monitoring setup; it encompasses the entire lifecycle management of monitoring resources, offering flexibility and agility in managing monitoring infrastructure. By adopting monitors as code, organizations can achieve fully automated monitor management with traceability and collaboration capabilities. This approach aims to empower data engineers to integrate monitor creation and maintenance seamlessly into their CI/CD processes.

The monitors as code can also be seen as one possible approach to enable smart contracts. Smart contract is a self-executing computer program that automatically executes the terms of a contract without the involvement of third parties. It seems that the discussed contracts standards are a step towards this direction.

However, the same “monitor as code” is not applied as logic for all objects in the standards. Some objects follow the traditional approach and just state the value as attributes. The standards seem to be mix of old legacy model and something we could label as “contract as code” model.

Structure drives adoption

The Schemas are for humans and machines. Humans need to understand it in order to apply it in information system development. Structures like objects as grouping elements make it easier to adopt and understand compared to totally flat list of attributes. For machines the structure is not that significant as it is for humans. Although structure is good to be present, but too much of it results to situation when amount of structure exceeds the amount of content. This has been one of the critics of XML for example. In comparison discussed above, the Data Contract Specification has more object structures compared to Open Data Product Standard.

UX patterns adopted here and there

However the design will have an effect on what happens next. Good example is internationalization (i18n) in which some of the Schema defined datatypes are translated to multiple languages most commonly UI/UX purposes. In that context certain patterns have emerged and multiple internationalization tools provide support for the patterns. The best example is application of it in UX/UI code. In there JSON structure is constructed to follow the UI/UX design and attributes are located in the tree as they appear in the interface to enable easier retrieval in the code level. This in turn has some positive effect on the code maintainability as well. The Open Data Contract Standard appears to follow this pattern in SLA object and define also UX labels for the fields.

Domain or application orientation?

On the other hand, the above internationalization example exemplifies application oriented data design, which is not advised action to take in all of the cases. Domain oriented design has been discussed a lot during the past decade or more. This domain orientation has also gained significant foothold in the data economy especially in data management and governance. Studies suggest that data management focuses on handling company assets, while data governance structures relationships and processes to assess the effectiveness of management in achieving organizational goals. Data Contract could be a domain in your data governance as we want the contracts to be unified in structure to enable easier management, risk reduction, and performance in system level. On top that contracts can be considered as core elements in business and thus their role is quite central.

Business elements emerging in the standards

Should the SLA be part of the data contract or not is another question to discuss among the practitioners and developers of the related standards. Open Data Contract Standard contains SLA object, but Data Contract Specification does not. Interestingly “as code” was not applied in the SLA object. Instead, the SLA object consists of property array list with attribute values.

Pricing is another element which is now part of one data contract specification (Open Data Contract Standard) , but not included in the alternative. The amount of details in the existing experimental SLA object is very scarce and can hardly support different kind of pricing plans. The Open Data Product Specification (ODPS) overlaps here with Open Data Contract Standard. ODPS contains standardized support for 11 types of pricing plans. It will be interesting to see which direction Open Data Contract Standard takes with pricing obejct in the future. Perhaps they should consider adopting ODPS model as such or with modifications.

SLA and pricing are heavily business related aspects of the data product and value creation. The explored standards are heavy on technical aspects of the data products and the mentioned two aspects, SLA and pricing, add business elements to the discussion.

Since the “Everything as Code” is already emerging in the data contracts and business elements such as SLA and pricing are possibly part of it in the future, I explored the ideas of “SLA as Code” and “Pricing as Code”.

“SLA as Code”

As we discussed above the data quality is an example of “Everything as Code” in the realm of data contracts. We might be able to do that for SLA as well. SLA as code refers to the practice of defining and managing Service Level Agreements (SLAs) using code and automation tools. Instead of relying solely on manual processes and documentation, SLA as code involves codifying SLA parameters, metrics, and expectations into executable scripts or configurations. This approach enables organizations to automate the monitoring, enforcement, and reporting of SLAs, streamlining operations and ensuring consistent service delivery. The simplest implementable example would be data provided via API. For long time we have had uptime dashboards and systems for APIs. Given that uptime tools would provide YAML based rules definition for SLA monitoring, this part of the data contract could become EaC oriented as well. For databases there are multiple monitoring solutions and the same logic could be applied in that as well.

Are we that for from it to become reality? Perhaps not since for example Terraform and Uptime integration as code exists already. Terraform is HashiCorp’s infrastructure as code tool. It lets you define resources and infrastructure in human-readable, declarative configuration files, and manages your infrastructure’s lifecycle. I did not go yet into details in this but could be done. If this Uptime monitoring logic would be applied to SLAs we could define the rules as code. In the Open Data Contract Standard the SLA is a set of fixed value as expectations. Next to those we could have “as code” configuration as well.

“Pricing as Code”

The pricing as code is also tempting concept to think of. Currently the data contracts standards are very thin regarding the pricing as was shown previously. One of the reasons most likely is that data monetization is not that big yet compared to data exchange in which pricing is not that relevant. Some attempts to implement pricing as code can be found. Already archived solution tier developed by Tier.run available in Github is one and it describes the idea quite nicely. Below is an example taken from Tier Hello World Demo. On the left side is the given JSON example from Tier and right side is YAML version generated from it.

The above example is not that far from Pricing plans object in Open Data Product Specification. As Tier describes the idea of “pricing as code” in their case is to enable seamless integration with payment gateways such as popular Stripe

“Data Product as Code”

I am taking a little side track here to Data Products, which is the area of the data economy standards in which I have worked around a standard. This is also natural since I am now exploring the two concepts side by side: contracts and data products.

The exploration with data contract standards brought up and idea of “Data Product as code” variation. This is close to “Data as code” which refers to treating data infrastructure and management like software development, enabling agile, reproducible, and scalable data workflows. The addition to “Data as Code” in Data Product as Code would be to include suitable business related and relevant elements “as code” as well — data quality, pricing plans and SLA as examples as discussed above.

If this would be the case, then we would not define data product metadata in the standard in traditional approach, but something else which is a combination of fixed metadata and “as code” elements. One might consider labeling the “as code” parts as dynamic metadata and the values are not known until executed. This kind of dualistic practise is already in use inside Open Data Contract Standard SLA object in which the attributes for dynamic rules are not mandatory. One clear benefit of the dualistic description is that it enables SLA level verification as part of the CI/CD process.

If the approach discussed above for example for data quality is applied to Data Products as well, we could have a model in which at least part of the objects are defined as configurations, rules or alike (“as code”) while some parts are following the traditional end state or static attribute value approach. It would seem likely that we need to define DQ thresholds in some “static” form, not just provide means to monitor and inspect the values. In all situations customer is not able to validate the quality via executing the code, but just needs to see what is the expected quality. But I will leave exploration of this thought to another post. Yet, it does seem very intriguing and worthy to explore later in the research focusing on the Data Product standards.

Next step — discover business details

Interviews with data contract standard developers

In the next phase I will interview the developers of the mentioned standards in order to find reasons to the decisions for example regarding the SLA and pricing elements. In addition to getting more insights on the background of the standards, I will also explore the business needs and drivers behind the standards.

--

--

Jarkko Moilanen (PhD)
Exploring the Frontier of Data Products

API, Data and Platform Economy professional. Author of "Deliver Value in the Data Economy" and "API Economy 101" books.