Towards Data Quality: An Overview of SQL Server Data Quality Solutions
Inadequate data quality can lead to significant financial or business opportunity losses. Still, many SQL Server data practitioners are unaware of the available data quality solutions.
Data Quality Services (DQS), Master Data Services (MDS), and SQL Server Integration Services (SSIS) components are the main data quality solutions offered by the SQL Server platform. These technologies can assist businesses in improving their data quality by locating and resolving quality concerns.
In the previously published article in this series, we explained what data quality management is and the primary data quality issues and techniques. This article briefly explains SQL Server’s solutions to assess and improve data quality.
A Quick Reminder
Before we begin, it is worth mentioning two main data management concepts:
Knowledge base
A knowledge base is a repository of information that defines data quality rules and processes. It contains business rules, domain-specific knowledge, and other information necessary to evaluate and correct data quality issues. A knowledge base can include various components, such as data domains, quality rules, reference data, synonyms, and other metadata. It is a central hub for data quality activities. It provides a consistent set of rules that can be applied to different data sources and scenarios. Organizations can use a knowledge base to ensure data quality.
Golden Records
A golden record is a unified view of an entity, such as a person. It represents the most accurate and complete representation of that entity. Golden records are typically created created using data quality and matching techniques by combining and consolidating data from multiple sources. The resulting record provides a single source of truth for the entity. This can support various business activities.
The concept of a golden record is closely related to master data management (MDM), which is managing and maintaining the core data elements critical to an organization’s business operations. Golden records are often a vital component of an MDM solution, as they provide a reliable and authoritative view of the critical data entities managed by the MDM system.
SQL Server Integration Services
Before SQL Server 2005, SQL Server Data Transformation Services (DTS) was the primary tool provided by Microsoft for data integration and ETL operations. In SQL Server 2005, Microsoft introduced SQL Server Integration Services, a more powerful and flexible platform for data integration and ETL operations.
Over the past years, new components were added and improved in SSIS. Now, several components can be used to assess or improve data quality:
- Data Flow Transformations: SSIS includes several built-in transformations that clean, standardize, and validate data. These transformations include Data Conversion, Derived Column, Conditional Split, and Lookup transformations.
- Fuzzy Lookup and Fuzzy Grouping: These components utilize fuzzy matching algorithms to identify related or related data across different data sources. Fuzzy Lookup matches similar values within a single data source. In contrast, fuzzy grouping groups similar values across multiple data sources. These components can remove duplicates and enrich data from external sources.
- Data Profiling Task: This task analyzes the data in a source data store and generates statistics and metadata that can be used to identify data quality issues.
- Data Quality Services Cleansing Component: The DQS Cleansing Component integrates DQS functionality into an SSIS data flow. It allows users to apply data quality rules defined in a DQS knowledge base to the data flowing through the data flow.
Moreover, a bunch of third-party components were developed by leading software solutions companies.
SQL Server Master Data Services
Master data refers to the core data elements critical to an organization’s business operations, such as customers, products, and employees. Master data typically represents an organization’s most significant entities and attributes, whereas the golden record is often used to represent a specific entity’s source of truth.
SQL Server Master Data Services (MDS) was introduced with SQL Server 2008 to provide a platform for managing master data in an organization. MDS provides a centralized repository for storing and managing master data. This service helps ensure that master data is accurate, consistent, and up-to-date across all systems and applications that rely on it.
MDS can help govern and improve the data quality by providing the following features:
- Hierarchical data management: Users can develop and manage master data hierarchies using the hierarchical data model provided by MDS. This feature enables them to manage and organize data in a systematic manner that mirrors the business processes of their organization.
- Versioning and auditing: This feature allows users to keep track of changes made to master data over time.
- Business rules and workflows: MDS provides flexible business rules and a workflow engine that allows users to define custom business rules and workflows to manage their master data. This feature ensures data consistency and accuracy according to the organization’s business rules.
SQL Server Data Quality Services
In SQL Server 2012, Microsoft introduced a knowledge-driven data quality solution name DQS (Data Quality Services). This feature mainly serves in data cleaning and matching, enabling users to build and manage a knowledge base to improve accuracy and consistency.
The knowledge base in DQS is an essential component of the tool, as it contains the rules and reference data needed to identify and correct data quality issues. Users can create and manage knowledge bases in DQS to address specific data quality issues or improve.
DQS can help improve an organization’s data quality by providing the following set of features:
- Data cleansing: DQS provides data cleansing transformations to standardize, validate, and correct data.
- Data matching: DQS can identify and match duplicate records in a dataset.
- Profiling: DQS provides data profiling tools to assess data quality.
- Knowledge base management: DQS provides a knowledge base management interface that allows users to define and manage data quality rules, reference data, and other metadata needed to improve data quality.
- Integration with SSIS.
- Integration with MDS.
Summary
SQL Server provides several solutions to ensure data quality, such as Data Quality Services, Master Data Services, and some Integration Services components. Those solutions help maintain acceptable data quality. While they have recently been integrated with the Microsoft Azure ecosystem, Microsoft has not announced any significant updates or improvements in recent years.