Daniel Palma
6 min readOct 20, 2022

--

Read the full article for free at: https://www.arecadata.com.

Implementing Data Contracts

trust no one

Data contracts help you document and enforce the shape and metadata of your records through data pipelines and processing systems. Their main goal is reducing surprises and getting rid of undocumented changes.

For example, if data producers and data consumers agree that the data interchanged between them has a specific schema, this can (and should) be verified for every message.

If the schema changes on the producer side and the consumers are not aware of this, they will very quickly fall apart, so it’s essential that these contracts are stored somewhere and upheld by both sides with automated verification checks.

All the code snippets mentioned in the article are available in this repository.

Theory

There are many ways actually to define this contract between data producers & consumers. Kafka for example has an excellent tool for this, called the Schema Registry.

Your primary pipeline without any contracts can look something like this

--

--