High Cost of bad data and why high quality data is important to drive business outcomes — Part 1

just jilan
DataDreamers
Published in
3 min readJul 18, 2022
Photo by Ahmed Zayan on Unsplash

A Gartner study tells that companies are losing around $13 million per year due to bad data quality and many orgs do not have a proper mechanism to measure the impact of bad data points which is more alarming.

LeadJen study reveals that the inside sales rep who are provided good data achieve nearly 4x times the qualified appointments and they are also prone to lose around $20,000 annually if the data is in bad shape.

As per zoom-info analysis in 2019, at least 1 in 5 companies has lost a customer due to data issues.

Common Pitfalls of Bad Quality Data

For Ex, one of our sales rep had to meet the customer A as per our segmentation algorithms, it turned out to be a High Tech — high value customer only after the discussion sales rep found that the email is bounced after sending the conversational notes. After doing a RCA, it was found that the marketing data was not validated correctly which led to a typo in the email.

I will be listing common data issues which we face in daily activities and the list may be more lengthy than i have mentioned here.

  1. Data Updation / Freshness — how often are we updating the references and content

2. Non Normalized Data — Contact numbers, emails, country names which needs to be normalized before passing to downstream apps

3. Range of Data — What are the min, max ranges and distribution of the data sets

4. Lineage/ Schema — Changes in the database schemas which leads to failure of certain pipelines.

5. Missing Values or Nulls

6. Duplicate data entries — Customer names are duplicated due to a space or a extra character

7. Translated incorrectly

8. Human Typos — 1% average error rate due to the manual entries

There is always a high cost, time and trust associated with these invalid pointers in data and in reality we have also come up with scenarios where we had to prepare the best data insights possible for greater audience only after the session/pitch we get to know that there are issues in data fundamentals and our outcomes/suggestions turned out to be incorrect.

As per my understanding, I have also seen the data teams do not often get the right prioritization when compared with other roles like data science, ML, full stack engineers who often get to chose enough choices/bandwidth on set of tasks to do.Most likely the business or development owners assume that data quality checks are by default done by the engineers who are planning to source the ETL pipelines without a cross collaboration with down stream users/teams.

Imagine the amount of trust, data value we can build with internal teams and downstream users once the data quality standards are set if the right prioritization is given for high quality data.

However, these trends are changing as many Hi Tech companies like Intuit, Airbnb, Uber, and Netflix who gave their full commitments on high reliable and availability of data across their internal teams and stake holders by using the best data ops practices. We are all aware that this data feeds their daily activities ranging from bookings, digital apps and relationship with customers. A minute error in the baseline data can cost them a longer time to retain the customer or the loss of revenue will be high and in few cases the customer trust is lost.

In the upcoming article part 2 , I will be listing out the mechanisms which were used as part of development cycles and tackled almost 44 million rows of 3rd party data to validate and overcome these issues.

Photo by Manki Kim on Unsplash

Thanks for reading!…

If you thought this was interesting, leave a clap or two, subscribe or watch out my LinkedIn posts for future updates!!! May the Data be with you!!

--

--

just jilan
DataDreamers

Digital Marketing. Data Science. Machine Learning Engineer. Academic Professional.