What is Data Quality and How to Ensure It in Your Organization
How is your data?
If that very question makes you cringe, then you’ve come to the right place.
No one wakes up in the morning thinking:
Yay, I get to work with bad quality data today!
Sadly, that’s the way things are in many organizations.
It doesn’t mean it has to be that way. Not if you want to make truly data-driven decisions.
So lately, if you’ve been asking questions such as:
- How can you understand your data better?
- How can you make it accurate, complete, reliable?
- Are there any measures or checks for ensuring data quality?
- In spite of setting up quality checks, you have bad data in your systems. Is there a way to fix it?
- Lastly, is there a way to improve data quality and ensure that the quality doesn’t go downhill again? (hint: it’s called data quality management)
… then good job! It shows that you’re aware of your data problems and are actively looking for help. That’s the first step. #psychology101
After awareness, the next step is to find a solution. So read on as we answer all those questions (and more) on data quality, starting with what refers to data quality.
What is data quality?
Data quality is the ability of your data to serve its intended purpose based on seven distinct characteristics.
But before exploring these characteristics, let’s understand the concept of data quality better.
Defining data quality
A quick online search will give you countless definitions. After giving it some thought, here’s how we define data quality and high quality data:
Data quality is the answer to the question “How is my data?” If your data helps you with business operations and decisions, then you can say that your data is of good quality.
BTW … of all the sources online, we found the definition from Thomas C. Redman, “Data Doc” and author of the book Data Driven to be the most relevant:
Data is generally considered high quality if it is “fit for [its] intended uses in operations, decision making and planning”.
Defining data quality management (DQM)
And the process that you adopt to improve and ensure data quality at all times is called data quality management (DQM).
Now you might wonder, a process?
That’s because data quality can’t be a one-time activity — purging a few rows of bad data or adding a glossary with a few key terms. Data quality needs consistent care and attention. DQM is simply the practice of focusing on and consistently improving the quality of your data.
One of the most important parts of DQM is to understand what quality data looks like. So let’s look at the characteristics of data quality.
What are the characteristics of data quality?
There are seven factors that play a huge role in determining data quality.
- Accuracy: Is your data correct, precise, error-free? Without accuracy, your data is misleading and useless.
- Availability: Is the right data available to the right people within your organization? Data has to be available and accessible for the humans of data to do their jobs.
- Completeness: Is your data incomplete? Is some information missing? Incomplete data leads to gaps in information, making it harder to put data to use.
- Granularity: What’s the level of detail that your data can provide? The right degree of granularity in data is necessary for accurate and effective decision-making.
- Relevance: Do you know whether you really need the information that you’ve collected? What’s the purpose of the data you’ve storing? Irrelevant information just ends up wasting your time, effort and money.
- Reliability: Is your data ambiguous, vague or contains contradicting information? In all such cases, the information you have is unreliable and you cannot trust your data.
- Timeliness: Is your data outdated or obsolete? Data collected at the right time is an important measure of data quality. Relying on data that isn’t timely is misleading and can lead to inaccurate decisions.
Now let’s revisit the definition of data quality to make it sound more complete:
If your data is accurate, available and accessible, complete, relevant, reliable, timely, provides the right degree of granularity and helps you with business decision-making, then your data is of good quality.
I know, that’s a lot to ask from your data. But then that’s how important it is to have high-quality data.
Still skeptical about its importance? Then let’s slay your doubts once and for all.
Why is data quality so important?
When your data is poor, incorrect, incomplete and unreliable, the consequences can be quite damaging for your business.
Think back to when you spent two weeks working on a report for sales showing business deals won and lost.
On day 1, everything was hunky-dory… birds chirping, sun shining down on you and your Excel.
But by day 5, the weather had changed to cloudy with a chance of data errors?
Come day 13, you realized that the data was not even reliable — something that you had no way of knowing since you couldn’t see the source nor the changes that happened to it before it reached you.
After all, data that comes to you as an isolated Excel file will never give you the complete context you need to understand the quality of data.
The result? That funnel never got tweaked and the numbers didn’t improve, at least not within the time frame that you’d initially planned.
The problem with bad data
And you’re not alone in this. See what others have to say about the toll that bad data exacts from businesses.
The cost of bad data is 15% to 25% of revenue for most companies. — HBR
Knowledge workers waste up to 50% of their time dealing with mundane data quality issues. For data scientists, this number may go as high as 80%. — Sloan Management Review
Still unconvinced on the impact of bad data? Here’s a $3.1 trillion dollar reason for you.
The yearly cost of poor quality data, in the US alone, in 2016 was $3.1 trillion. — IBM
Bad data + deadlines = chaos & mismanagement
Dealing with erroneous data and misleading information when you’re facing tight deadlines can be exhausting and hardly solves the root problem.
In such cases, you’re most likely to make corrections by yourself using your best guesses so that you meet your deadlines. You’re less likely to look for the person responsible for creating/collecting the wrong data and report the issue.
So instead of fixing the problem once and for all, you’ll just keep implementing temporary fixes, which doesn’t help save time or effort (much like firefighter Charmander here).
Redman summarizes the problem with bad data and its impact on the humans of data in the best possible manner in his HBR article:
Salespeople waste time dealing with erred prospect data; service delivery people waste time correcting flawed customer orders received from sales. Data scientists spend an inordinate amount of time cleaning data; IT expends enormous effort lining up systems that “don’t talk.” Senior executives hedge their plans because they don’t trust the numbers from finance. — Thomas C. Redman
And that’s why data quality is important, which leads to finding the solution to the bad data problem.
Curious to know more about the solution to the bad data problem? Then check out our comprehensive article on data quality here.