How to Overcome the Most Common Data Quality Mistake

Spencer Essenpreis
3 min readJul 4, 2023

--

Image by Bing AI Image Creator

This is the fourth in a series of posts about moving beyond data quality to data trustworthiness. You can start from the first post here.

The two most dangerous creatures in the world are both surprising: mosquitos and assumptions. When it comes to questions of data trustworthiness, never assume accuracy: prove it instead.

We’ve all had it happen. We raise a question about a potential issue with data, and a data worker provides a lengthy explanation with absolute certainty as to why we would expect to see exactly what we’re seeing in the data — which turns out to be wrong. And if those of us who work in data are honest, we know too well the taste of that crow.

Most data issues are foul ball wrong — they’re in the ballpark and headed in the right direction, they’re just a bit outside the lines. Because they’re almost right, we can come up with quite reasonable explanations for why the data looks the way it does. Oh of course those sales look high, that’s our top category in that market. Why yes those usage rates are awfully low, that age group tends to underutilize. Those explanations though are unproven hypotheses potentially masking real accuracy issues.

It’s like a Bizarro comic — those of us in data professions are professional hypothesis testers content with our own untested hypotheses. Why? I’m good and I’m busy. I know what I’m doing so the data must be right; I’m really busy with these other critical priorities so I don’t have time for your questions about perfectly good data. These perspectives then lead to a negative view of the person raising questions — they’re criticizing our work, they don’t understand the data, and/or they’re just nitpicky and don’t understand what’s really important.

The solution then starts with three changes of perspective.

First, we should have the humility to assume we may have made a mistake. Sure, we may be good — and we’re not perfect, so there probably is an issue. We may have a good explanation, and we should test it.

Second, we have to realize building trust is as critical as our other priorities — and maybe more so. What good is that big strategic analysis if no one trusts it? What if that small error is a symptom of a larger failure? As much as we have going on, it’s still important for us to take questions of data trustworthiness seriously.

Third, we should assume the best of the person raising the question. They’re raising their hand to help rather than to criticize. They understand the data from a perspective so they probably see something we missed. They have a high concern for our key priority of data trustworthiness. We should reward their courage and helpfulness with a serious investigation.

Our perspectives shifted, we must then always prove data trustworthiness. We must take our hypotheses about the data in question and test them just like we would any other. Sometimes that extra effort will build trust by proving the data is accurate. Other times that extra effort will build trust by finding an issue we can fix. We always win investing time in proving data trustworthiness. As for assuming — well, we know what they say we get when we assume…

Data trustworthiness is built when we consistently prove data quality rather than assume it. In the next post, we’ll look at the question of when a data issue is small enough to ignore.

--

--

Spencer Essenpreis

Strategic Analytics Leader & People-Centric Culture Builder