Manifesto on data testing at Ovrsea: an architectural design

Paul Couturier
OVRSEA
Published in
8 min readSep 16, 2022
Image 1: Photo by Sven Mieke of a blueprint — unsplash.

TL;DR: At Ovrsea we have been working for some weeks now on how we could improve our way to test and give the right data to our users. And we finally came up with a philosophy that supports and drives strongly our data testing implementation process.

We have created a data testing philosophy that relies on three mantras: “Ownership matters”, “Perfect accuracy is no goal”, “Process & culture over project”.

Using these guidelines, we have defined an implementation strategy that is based on two types of test: sanity check & data quality test.

Context

Data testing is a pain point for any tech company. Never satisfied by the accuracy and always looking for the “right” coverage, these challenges lead any data team to endless discussions on test implementation…

At Ovrsea we have been working for some weeks on how we could improve our way to test and give the right data to our users. And we finally came up with a philosophy that supports and drives strongly our data testing implementation process.

Ok but before starting what exactly is data testing ?

Generally speaking, testing is the process of executing a program to find errors. In the data context, tests aim at finding logic flaws that should not happen for the data to be coherent. What is not data testing ? It is not a way to monitor data shift or outlier. Incoherent data is different from outlier as it should have 0 chance of happening.

In a sense you can think of data testing as … architectural design.

The blueprint is what you design conceptually and hope to see build and the building itself is your actual schemas and data pipeline. Testing is simply checking on the field how your building actually looks like and if the structure may not have any flaw that would cause some heavy consequences for people using this building.

But why are we testing again ?

We are testing because no one expects an organisation with hundreds of tables and dozens of data sources from not having any flaw in it. As a data team, our role is to provide accurate data to any internal or external user. BI as well as sensitive reporting or ML projects depend on the accuracy of our data. Thus testing is a way to avoid missing new mistakes and to get people to trust our data by alerting them even before they noticed the error. As a building never exactly looks like its blueprint, you need to check and find the incoherencies and potential threat to its stability.

1. Data testing philosophy

One of the main goals at the data team is to provide accurate data to all our users internally and externally. After reviewing literature and facing concrete implementation challenges, we have been defining our data testing philosophy around three major pillars.

“Ownership matters”

In general, tests should be owned and designed by those who are in charge of either its creation or its use. Indeed they are the ones knowing the logic that data should follow. If you have not designed the blueprint or don’t understand exactly what are the overall implications between system interactions in the building, it is very hard to say if the actual system is coherent or not.

In the case of Ovrsea two majors stakeholders intervene in this situation :

  • Tech team who are producing the data used in our plateform
  • Business teams (Revops, OpsExc, Billing) who are in charge of BI or external softwares

Both teams have a role to play. Tech teams are extremely good at understanding links between tables, front & back etc while business teams know the exact output they should expect with a high level of abstraction from the code itself. Both should be mainly the owner and designer of the tests while the data team should only help implementing these tests.

However as a data team we are still in charge of some part of the data creation. Indeed we are doing transformations to enable the tech practical data tables to become readable and usable by anyone non tech related at Ovrsea. These transformations are also being tested and here the data team should take the lead.

“Perfect accuracy is no goal”

Test considering your ROI: if testing is taking longer than needed or more expensive than the consequences of an error, then STOP. Indeed if the building is not exactly looking like the blueprint it doesn’t mean it is catastrophic, it simply depends on the consequences of it.

Ok but how can we do that ? Concretely to improve you ROI you would need to focus on high coverage of specific zones. These zones should be defined by their usage frequency or their criticality level.

For example in your data pipeline these zones could typically be specific layers of transformation. Once you found the layer to test you should focus your tests on what you think could break or be wrong. Testing is not a reassuring task and should never be. If you are only focusing on data that is easy and unlikely to be inconsistent (break old word), you will never find any errors and still believe you pipeline is perfect. Testing is an appetite for finding errors. When errors are found while you are testing then don’t be disappointed, this means you have done your job! As we can’t expect our pipeline to be perfect, we should be grateful for finding errors such as we improve our accuracy. So test until it breaks!

“Process and culture over project”

As a data team you are mainly in charge of providing the test framework and the process to be able to implement the tests as quickly as possible. But more than that, you need to insufflate the culture of data testing in all the others teams! Testing is often laborious if the wrong routine has been put in place. This will never be a one time project but it should be a wise and well defined process that should be improved each time a lack has been found. As you are not going to check only once your building is completely finished, you should give process to follow the construction all along its lifecycle.

2. Test process & strategy at Ovrsea

A philosophy is nothing without implementation. Before defining this philosophy, we have been searching and benchmarking on all the ways to do data testing on our pipeline considering our current stack. You can found all this information in our previous article.

So in our process and implementation strategy we need to mix and create the perfect match between : enabling owners to design tests, considering the ROI of test and providing an efficient process and framework to insufflate testing culture at Ovrsea.

To answer these three pillars we have been defining a strategy relying on two kinds of test that we have called :

  • Sanity checks : requested by other teams (handled by the dbt macros test framework)
  • Data quality test : defined by the data team itself (handled by the dbt generic test framework and pytest)

These two categories of test answer different needs to cover the three pillars defined above:

  1. “Ownership matters”
  2. “Perfect accuracy is no goal”
  3. “Process and culture over project”

Sanity checks

These tests are designed upon request from other teams and implemented by the data team. Because we are not in contact with the logic of code nor the business intelligence, we cannot design and think of these tests by ourself. These tests are covering general purpose of error between the fabrication of tables by tech and the final use by BI.

This should be complex tests specifically asked by external team to check a behaviour that is hard to check otherwise due to the layer of complexity from raw table to business understanding.

Dbt macros test offer us a simple framework to implement these tests as SQL statement using singular test concept.

Example of a BI sanity check

Our BI team needed to check that if the status of a client was either “open”, “demo” or “first quotation”, then the variable “number_of_opportunity_won” should be equal to null. It allows them to test for inconsistencies in the customer journey. After the request was made by our BI team, we implemented a dbt singular test using SQL to check this statement.

Image 2: Code of a dbt singular test asked by one of our BI team to check the client journey consistency

Process

The test demand is done through a specific ticket to the data team. Partly to enforce the use of tests, we are using the Data Champion organisation. Each data champion participates in general monthly meeting as well as a weekly 121 meeting. In these meetings we can talk about the test definition and strategy for each team. This is a way to communicate efficiently with them.

Data quality test

However to answer the need to have a quick and yet efficient safety net and to handle the fact as a data team we are also creating some data in our pipeline, we need to have very easy implemented tests of our own. We called these tests the data quality test.

Data quality tests are different from sanity checks as they are only superficial tests that aim at seeing huge inconsistencies. In fact they fulfill the need for general checking of the pipeline transformation and could catch up errors few days before a human would.

Image 3: Code of a dbt generic custom test triggered on the column “created_at” to check the freshness of this column in hours.

These are simple tests designed by us using the custom generic test framework of dbt (or the pytest framework for test on python function). We have created 5 templates of custom generic dbt tests that we can apply anywhere in the pipeline on any table or column in one sec. Implementing this test should not require any intense business reflexion. If it does, it means it is a mistake. They should be transformed into a sanity check test and the data team should not be the owner of this test.

In regards to the pillar “Perfect accuracy is no goal” even if these tests are very quick to implement we are still prioritizing high value layer to test: we have started by testing the most used tables in self-service and the sensitive exposures.

Process

Two processes are applied to implement tests :

  • At each merge request: if you have modified a table, you need to review the tests of this table to see directly any update you could do. This avoids having to write tests weeks after updating the table, which would mean wasting time understanding the table again.
  • Testing day: when we need to target a new layer or work on technical debt regarding test, all our Data team is dedicated to implement or update tests during either half day or an entire day. This is also a good way for the team to be up to date on schema modifications.

Conclusion

From a benchmark of testing possibilities on our stack to creating a philosophy and finally transforming it into a concrete strategy, the road has been long and hard.

The final framework gives us very good satisfaction right now but we need to keep on improving and modifying it as new challenges appear. Moreover one task remains : How to be sure to trigger actions from other teams when tests are failing ? Is the alerting system downstream adapted enough ? Well, we are still working on this final part but an article should be coming soon so stay tuned !

Thank you for reading and feel free to reach to us! We will be more than happy to discuss and debate on data testing best practice with other companies!

--

--