Accelerating data remediation with AI

AI4DQ from QuantumBlack, AI by McKinsey

--

As an organization scales the volume and diversity of its data capabilities, the challenge of governing and solving data quality issues grows too. In this article, we explore why data quality is so important, and how organizations can unlock it.

QuantumBlack Labs is the R&D and software development hub within QuantumBlack, AI by McKinsey. QuantumBlack Labs has more than 250 technologists dedicated to driving AI innovation and supporting and accelerating the work of its more than 1400 data scientists across over 100 locations. We use our colleagues’ collective experience to develop suites of tools and assets that ensure AI/ML models reach production and achieve sustained impact.

QB Labs developed the award-winning AI4DQ (“AI for Data Quality”) product that uses AI to identify and remediate data quality challenges at scale. We describe how AI4DQ enhances and accelerates the detection and correction of data quality issues, and how it embeds solutions that combine modularity and hybrid intelligence.

An example of AI4DQ in action

The importance of quality data

30% of total enterprise time is spent on non-value-add tasks as a result of poor data quality and availability. Some 82% of respondents to McKinsey’s Master Data Management survey spending one or more days per week resolving data quality issues, and 72 percent of leading organizations considered data management to be a major barrier to scaling impact.

In our book, Rewired: The McKinsey Guide to Outcompeting in the Age of Digital and AI (Wiley, June 2023), McKinsey highlighted nine critical dimensions of data quality:

  • Accuracy: The degree to which data matches the agreed-upon source.
  • Timeliness: The timescale within which data should be refreshed and the acceptable system lag when values change.
  • Consistency: The extent to which identical data must have the same value wherever it is stored or displayed.
  • Completeness: The degree to which fields must be populated and the required breadth, depth, and historical context.
  • Uniqueness: The extent to which data should be uniquely stored in one place and be unique for one customer.
  • Coherence: The extent to which data definitions remain consistent over time so that historical data retains the same context.
  • Availability: The degree to which current and historical data is available for analysis.
  • Security: The extent to which data is held securely, subject to access restrictions and recoverability.
  • Interpretability: The extent to which clear definitions for data are in place, enabling easy understanding.

It’s not uncommon for the data required for AI-based solutions to be of poor quality. This is the basis of the classic “garbage in, garbage out” problem, where subpar data undermines the progress of AI transformations through wasted investment, slowed progress, and flawed decision making, as well as risking legal issues and reputational damage.

The challenges of improving data quality

So, why is data quality such a persistent issue for large enterprise organizations? Several common challenges contribute to this problem:

  • A lack of standardized data collection practices leads to incomplete, inconsistent, or erroneous data.
  • Multiple, siloed data storage solutions create a fragmented view that is difficult to understand.
  • Absence of a data governance framework causes compliance issues, inaccuracies, and inconsistencies.
  • Legacy systems and stalled migrations create fragmented and outdated data.
  • Inadequate data quality tools that aren’t able to identify and rectify data quality issues in real time.
  • Insufficient training and awareness leads to unintentional errors and inconsistencies.

Paradoxically, the primary challenge facing organizations as they scale their capabilities is scale itself. While increasing the volume and variety of data sources is essential for driving competitive advantage through AI, expansion exacerbates existing data quality issues. Complexities and inconsistencies multiply, making manual efforts to address these issues impractical. The larger the data ecosystem, the more pronounced the problems become, compounding one another and creating significant barriers to achieving high-quality, actionable insights.

At QuantumBlack Labs, we recognize that to develop competitive AI solutions, we need to use AI to address data quality issues, customized to the specific and unique context of each organization.

AI4DQ by QuantumBlack Labs

AI4DQ (AI for Data Quality) is an AI-driven suite of tools designed to enhance and accelerate the detection and correction of data quality issues, and the embedding of solutions to fix them:

We built AI4DQ to follow the principles of modularity, innovative data engineering, and hybrid intelligence.

Modularity

AI4DQ enables an organization to tailor its data remediation to its specific context. Our team has built more than 50 modules that can be combined to create an automated data quality solution, including anomaly detection and linguistic similarity assessment.

Innovative data engineering

This pieces the relevant AI-based modules together. Modules sit inside the flexible AI4DQ framework, which can integrate additional custom modules for teams that want to BYOA (“bring your own algorithm”). Organizations with specific challenges can use the framework to create production-ready custom workflows.

Hybrid Intelligence

AI4DQ embeds human expertise into each aspect of the workflow, enabling domain experts to integrate business context, validate issue detection, and provide feedback on corrections. Each interaction improves the system.

By enabling custom correction pathways, AI4DQ supports far more sophisticated remediation approaches that go beyond traditional data quality rules.

The impact of AI4DQ

AI4DQ is designed to be complementary to existing enterprise-grade data management systems. It is not a data quality management platform, but instead a toolkit for building custom solutions to complement existing offerings.

Traditional methods for solving data quality often omit the important question of: what is the return on investment for remediating this issue? AI4DQ aims to find and resolve business problems blocked by poor data, thus defining a clear path from improvements to commercial value.

For example, the team worked with an insurance company to improve the quality of its claims data and reduce “double payments” caused by duplicative records. By combining bespoke modules from the AI4DQ toolkit, the company saved $35 million that would have otherwise been spent on duplicative claims.

AI4DQ has now been deployed to over 25 organizations and has proven impact across multiple industries. Other examples of its impact include:

  • Global healthcare supplier: Improved the accuracy of package weight and dimension data from 63 percent to 96 percent within ten weeks, saving $6 million of direct shipping costs.
  • Aerospace manufacturer: Identified the root cause of satellite-to-ground signal failures, resulting in an 81 percent uplift in issue identification.
  • Leading European bank: Assessed key data quality dimensions across over 10GB (~450 tables) of customer data in order to produce a ‘golden source of truth’ database with 100% information completeness, correcting 106,000 customer records in the process.
  • Leading telecoms company: Implemented a workflow that de-duplicated alarm tickets generated from cell towers using ML/fuzzy matching and saved more than 120,000 hours in annual productivity.

When is AI4DQ most useful?

For AI4DQ to be successful, several key characteristics should be present:

  • Frequent occurrence: Issues such as data inconsistencies and inaccuracies must occur frequently within the data for pattern identification.
  • Benchmark examples: Clear examples of what ‘good’ data looks like, enabling the system to draw inferences and provide context for correcting ‘bad’ data.
  • Human feedback: Issues should be augmented with human feedback, enabling the machine to learn and improve over time.
  • Interoperable databases: To assess data integrity and validity, there should be multiple databases available and able to interact with each other.

Summary

Accurate data management is crucial for business success. Poor data quality and availability leads to time wasted on tasks that do not add value. With the advent of generative AI, these issues have become the primary obstacle to implementing and scaling new use cases. AI4DQ addresses such challenges by integrating human expertise with AI-based components, effectively solving data quality issues that traditional methods have failed to resolve. With a proven track record in over 25 deployments across a range of industries, AI4DQ’s scalable toolkit has emerged as a trusted solution in the field.

In the next article in this series, we will explore how AI4DQ has evolved to address data quality issues specifically related to generative AI use cases, particularly for unstructured documents. It is proving invaluable in evaluating generative AI readiness and ensuring that organizations have the necessary data foundations to fully benefit from generative AI.

QuantumBlack Horizon is a family of enterprise AI products, including Kedro, Brix, and Alloy, that provides the foundations for organization-level AI adoption by addressing pain points like scaling. It’s a first-of-its-kind product suite that helps McKinsey clients discover, assemble, tailor, and orchestrate AI projects.

To learn more about what QuantumBlack Horizon and AI4DQ can do for you, please email alan_conroy@mckinsey.com.

Thanks to all who contributed to this article: Alan Conroy, Nishant Kumar, Zihao Xu, Paul Southall, James Mulligan, Jo Stichbury, Joanna Sych, Sarah Mulligan & Matt Fitzpatrick.

--

--

QuantumBlack, AI by McKinsey
QuantumBlack, AI by McKinsey

Published in QuantumBlack, AI by McKinsey

We are the Artificial Intelligence (AI) arm of McKinsey & Company. We are a global community of technical and business experts, and we thrive on using AI to tackle the most complex problems.

QuantumBlack, AI by McKinsey
QuantumBlack, AI by McKinsey

Written by QuantumBlack, AI by McKinsey

We are the AI arm of McKinsey & Company. We are a global community of technical & business experts, and we thrive on using AI to tackle complex problems.

Responses (3)