Why the Need for Standardizing Data Contracts?

Jean-Georges Perrin
AbeaData
Published in
4 min readDec 8, 2023

--

This article describes the genesis of Bitol, the open-source data contract standard and solutions that are an essential part of modern data engineering. In an era where trusting data is becoming increasingly important, data contracts act as an agreement between multiple parties, specifically, a data producer and its consumer(s).

Note that this article focuses on the early standardization process and does not review what a data contract is. To learn more about data contracts and how you can help to build them, check the resources at the end.

The Risk of Fragmentation

Data contracts are a relatively nascent concept for many. However, their origin comes back from (at least) the 90s, as earlier versions were used in CASE tools for code generation. The lack of standardization and the many unfruitful attempts impaired innovation.

Having disparate and non-regulated formats slows innovation, increases vendor lock-in, and, ultimately, sabotages the standard. Imagine if, back in 1994, Netscape had its proprietary version of HTML; neither Internet Explorer nor Safari could have taken off (I hear the badmouthers in the back). This lack of standardization would have created a fragmented market, and the web would have been vendor-based. Which is indeed what Compuserve or AOL were at that time.

Bitol Is All About Open Standards

Logo of the Bitol project

Bitol is a Linux Foundation AI & Data project that creates and maintains open standards for modern data engineering, starting with data contracts through Open Data Contract Standard (ODCS).

As our teams built an implementation of Data Mesh, we realized the need for a resource descriptor. The number of elements needed in this descriptor kept growing. That’s when we decided to restructure the format and adopt a data contract approach. A few months later, we open-sourced a version of the template. I later took the template to a broader community, AIDA User Group, where it started its incubation process. Although AIDA User Group is a fantastic organization, it is not suited for developing open source & open standards. That’s where the Linux Foundation came into the dance.

Governance Is Key

Building the technical steering committee (TSC) was the next step. I set the bar high to get some of the world's experts in data contracts and data products. However, the committee needed a variety of people of various backgrounds. We wanted participants to be users, vendors, and service providers to ensure we had good coverage. We also wanted experts and learners from around the world. The TSC has reached those goals. More about the TSC will come shortly.

Bitol overseeing the jungle. Thanks to Atanas Iliev for the prompt suggestion for DreamStudio.

Bitol Is a God Of Creation

A Mayan sky god, Bitol is one of the creator and destroyer deities who participated in the last two attempts at creating humanity. At the beginning of days, they attempted to form sentient creatures with their associates: Alom, Qaholom, and Tzacol. In the third creation (or iteration), Bitol was transformed into Ixmacane.

Bitol, as a god, is a perfect analogy for this project. However, you will never know what iteration number we are.

AbeaData Provides Support

You may have yet to hear more about AbeaData, but we are a team of senior data & software professionals, veterans, some may say. One way to know more is to sign up for our countdown on our website.

Existing & Additional Resources

A lot of resources are popping around to explain the concept and implementations. As often with a new technology or concept, you have piggybackers who half understand the idea but want in or commercial ventures trolling about their products. So be careful; however, I am happy to add to this list.

First, as we say in France, charité bien ordonnée commence par soi-même (charity begins at home). The first time I mention data contracts it is in the popular article The next generation of Data Platforms is the Data Mesh. However, Data Contract 101 is really where I dive into more details. In What is Data QoS, and Why is it Critical, I dig more into the idea of Data Quality and service levels and how they relate to data contracts. Don't forget Implementing Data Mesh at O'Reilly, which is being written with my great friend Eric Broda.

Andrew Jones covers data contracts in his book Driving Data Quality with Data Contracts: a comprehensive guide to building reliable, trusted, and effective data platforms. It’s a very good book. I’m afraid I have to disagree with some of the things Andrew writes, but that’s more at the detail level and a discussion for the pub. Buy it confidently.

Chad Sanderson also has a few articles on his substack. Chad believes strongly in data contracts as well.

Please suggest other resources in the comments, and I will gladly add them here.

--

--

Jean-Georges Perrin
AbeaData

#Knowledge = 𝑓 ( ∑(#SmallData, #BigData), #DataScience U #AI, #Software ). Lifetime #IBMChampion. #KeepLearning. @ http://jgp.ai