3 Things Our Software Engineers Love About Data Contracts
When talking about data contracts, we naturally focus on the benefits for our data teams, and how by improving the reliability and quality of our data we can drive significant business value from the data.
But at GoCardless there is more to it than that. The way we have implemented data contracts has led to it becoming a valuable and much relied upon tool for our software engineers, and is powering our move towards a service-oriented architecture.
Here are 3 things our software engineers love about data contracts:
- Golden paths
Let’s look at each of those in turn.
You get a lot for free when you use data contracts! And you can choose exactly what need, depending on your requirements. Each come with sensible defaults, whilst being configurable.
You can get a BigQuery table, and again that schema matches your contract. By default, you’ll have backups set up for that table and kept for 60 days, but you can configure that as needed.
You can get a Google Cloud Pub/Sub topic, with a schema that matches your contract. That also comes with a dead-letter topic to capture any events that fail the schema validation and store them in a BigQuery table for later investigation.
If you choose both Pub/Sub and BigQuery, we’ll deploy a service that archives the data from Pub/Sub to BigQuery. No further configuration needed — it just works.
Need to replay data from BigQuery back to Pub/Sub? Maybe to seed a new service? Or because to recover data from the dead-letter store? We have a service for that too! Ready to run as and when needed.
We also provide tooling to support the management of the data, as per our policies. For example, we have a data handling service that anonymises or removes personal data when we should no longer be keeping it. You don’t need to be an expert in data regulation to manage your data! Just categorise your data, tell us what action we’d need to take, and our automated tooling does the rest.
For more details on our implementation, see our earlier post on implementing data contracts at GoCardless.
Implementing Data Contracts at GoCardless
At GoCardless, we’re using Data Contracts to improve data quality and reliability. This is how we’ve implemented it.
All those services mentioned above are yours, and you have complete autonomy in how you manage them.
If you need to grant access to a resource, you can — no need to get review from a central team.
If you need to scale up a service to handle your data volume, you can go ahead and do that.
This autonomy applies to the data contract too. You decide how the data should be structured, depending on the requirements of your consumers and what you feel able to support over the long term. You can evolve it as needed, working with your consumers to handle migrations if that evolution introduces a breaking change.
You don’t need permission to do this from a central function, introducing a bottleneck and slowing you down. We trust you to do the right thing, and supply the tooling and guardrails to make that as easy as we can.
3. Golden paths
Data contracts is a well-supported golden path. It’s our single solution for a common problem, which promotes consistency in our engineering stack.
When using data contracts, you know the data will have a schema associated. You know where to find that schema and its associated documentation.
Every data contract has an owner, so you know who to contact about this data and who can grant access to it.
The data contract also tells you how to access the data. You know where to find the Pub/Sub topic and/or the BigQuery table.
All of this enables our engineers to focus on developing their products and services and meeting their team and group goals.
Our data contracts journey continues!
Like everything, our implementation of data contracts is a constant work in progress. We continue to improve the tooling based on feedback from our fantastic engineers, much of it coming from the UX workshops we’ve been running.
But there’s no doubting we’re on to something with data contracts, and they are providing genuine value to our software engineers. They, and the services they create, are data consumers too! And just like any data consumer, they need data that is reliable so they can build on it with confidence.
Today, 80% of our Pub/Sub topics are managed with data contracts. We’re approaching 300 data contracts in production, with more than half of those created in the last 6 months. 28 different teams are using data contracts in production, with a few more experimenting in non-production environments.
Our data contracts journey continues ♥️