Starting A Data Quality Checklist

John McCarthy
10 min readMar 29, 2020
Photo by Mahir Uysal on Unsplash

I have some bad news for you.

Something in the data you used in your last project is wrong. You just don’t know it yet.

“But, no! I got that data from the UI team. They said it was reviewed!”

Congratulations. You can tell that to everyone while you are being walked respectfully out of the building because your new marketing model caused $200,000 in lost revenue. Don’t blame Google because you thought Coffeyville, Kansas was a meaningful predictor when it was just the default geocentric center of the United States.

No, it is not another team’s fault. It is not the rushed timeline’s fault. It is not some great universal struggle that is impossible to overcome. The quality of the product you create is YOUR responsibility. The inputs and the outputs belong to YOU.

Welcome to the thankless world of data quality.

There are different levels of investment and effort that you can put into quality control. At the very least, you should keep a series of lists, around each data source, for errors that you have run into so that you can prevent the same error from happening twice. If you don’t already have something like that in place, I can give you a jump start.

--

--

John McCarthy

I’m a manager in the Business Insights and Analytics world.