Data Quality Issues, the “Iceberg“

Liqun Xiao
The Good CTO
Published in
3 min readDec 24, 2023

​Over the years of delivering digital transformation solutions, we always finds that data quality issues are like icebergs. The hidden data quality challenges lurking beneath the surface of what seems like a straightforward dataset or system can cause huge risks to projects and business alike. They can not only hinder of the progress of digital transformation, but also effect decision-making, impair customer relationships, lead to regulatory non-compliance, and ultimately result in financial loss.

Visible and hidden parts of the “Iceberg”

The most apparent data quality issues are those that are easy to detect and are on the “surface,” much like the tip of an iceberg. These could include blatant errors such as misspelled names, obvious duplicates, or missing values in critical fields.

Just as the bulk of an iceberg’s mass is submerged and not readily seen, many data quality issues are not immediately apparent until you delve deeper into the data. These might include subtle inconsistencies, latent inaccuracies, or complex relational discrepancies that only emerge upon thorough analysis or during specific use cases.

How to manage the risks brought by the “Iceberg’?

There are generally two proactive approaches that we would recommend: the first one is a short-term tactic that we we usually adopt in digital transformation projects, and the second one is a mid and long-term operation strategy for enterprise in the long run.

1. Adopt data verification and validation throughout the lifecycle of data

Throughout the implementation of digital transformation projects, there are key milestones to ensure outcome are trustworth for business:

1) Data exploration:

We start data exploration in parallel to the business requirement analysis. Conducting data exploration early allows us to understand the existing data landscape, identify data sources, and assess the quality and structure of the data.This step helps determine whether the available data can support the business requirements and objectives of the digital transformation project

2) Quality assurance:

Based on the insights gained from data exploration and business requirement analysis, we develop a quality assurance (QA) plan that define the criteria for data quality and outline the processes for ongoing verification and validation in the development and testing phases.The plan should address the four V’s of big data (volume, variety, velocity, and veracity) to manage the scale, diversity, speed, and truthfulness of the data.

2. Establish a data governance framework

The framework defines roles, responsibilities, data standards, policies, and procedures for managing data across the organization in the long run. Among all of steps for implementing the framework, there are several key practise to the success of it:

1) Define Roles and Responsibilities:

Create roles such as data owners, data stewards, data custodians, and data users, and define their responsibilities in the data governance process.

2) Define Data quality Metrics:

Determine clear metrics for measuring data quality, such as accuracy, completeness, consistency, validity, timeliness, and uniqueness.

3) Employ Master Data Management (MDM):

Use MDM to create a single source of truth for critical data entities such as customers, products, and vendors.

4) Encourage a Data Quality Culture:

Foster a culture where data quality is a shared responsibility and everyone understands its importance in the success of the organization.

It is important to start with a clear vision, gain alignment among stakeholders, and incrementally build the governance structure and processes while continuously demonstrating value to the organization. Last but not the least, we should regularly review and update the data governance framework to reflect changes in technology, business objectives, regulatory requirements, and lessons learned to ensure that its’ aligned with our corporate strategy.

The “iceberg “will not disappear either in digital transformation projects or our daily operation. By proactively addressing potential “below the surface” issues and continuously working to enhance data quality, organizations can reduce the likelihood of encountering significant problems that could derail digital transformation projects or disrupt daily operations. Recognizing that data quality is not just an IT issue but a business-wide concern is essential to the long-term success of any data-driven initiative.

