“Data Quality Services(DQS) Cleansing Transformation in SSIS”

Smita Gudale
6 min readDec 9, 2019

--

Data Quality Services is a Cleansing transformation in the SSIS.

The complexity in data is bad data entered by the End-user. The data is good or bad it depends on the end-user.

The DQS is ensuring to the End-user or programmer that data quality is good. DQS is a knowledge base Tool it has an ability to detect the good data.

DQS is some kind of tool or power or service give to the End-user or programmer so they enter good data.

Steps For the Data Quality Services as Follows -

  • There are two types of Data Quality Services we want to open. First is Data Quality Server Installer and the second one is the Data Quality Client.
Data Quality Services
  • Firstly, Enable the DQS Server Installer→for that first install the DQS.
Installation of DQS Server
  • After that, It asks the password for the database master key →Set your password →Confirm the password →After some time it will be Executed →It will show a message like Follows.
Installation Completed
  • It creates a new three databases in the SQL as follows-
Data Quality Services Databases
  • Thereafter open the DQS Client →We need to connect to the DQS server and choose the knowledge base that we will be using for our source input data →Choose the server name →Click on Connect.
Connection Manager
  • My source data is like follows it is bad data.
Original data
  • Then we can perform three primary tasks with DQS Client→One is Knowledge Base Management → Second one is Data Quality Projects and→The last one is Administration
DQS Client primary Three Tasks
  • In the Knowledge Base Management, we define the data cleansing rules and policies →Click on New Knowledge Base.
Knowledge Base Management
  • Give the name → Select the activity Domain Management →Click on Next.
Data Quality Client
  • Then it opens the following Domain Management window →Create a Domain.
Create Domain
  • A DQS KB(Knowledge base) is a grouping of related data quality definitions and rules it is called domains that are defined up front.
  • They are Five Tabs like follows screenshot. These definitions and rules are applied against data with various outcomes such as corrections, exceptions, etc.
Domains
  • A domain defines what the DQS engine should do with data it receives→Is it valid?→Does it need to be corrected?→Should it look at external services to cleanse the data?
Term-Based Relations
  • In the Domain Values, we can correct the values like city name, country name, etc.
Domain Values
  • Domain rules include range rules to check that numeric data values fall within a given range and value lists to make sure that the data coming in meets specific requirements.
Domain Rules
  • Subsequently, Publish these rules →Click on Publish →Click on the Finish.
Publishing Rules
  • What we created is shown below →If we want to change it then we can.
Rules are Published
  • Data Quality Projects, we use for applying the data quality definitions from the knowledge base against real data.
Create Data Quality Projects
  • Create a New Data Quality Project →Give a Name →Select the Cleansing activity →Click on Next.
Create a New Data Quality Project
  • It will be Processing in the Four steps →1. Map →2. Cleanse →3. Manage and View Results →Export.
Data Quality Project
  • Then select the Source →In the Mapping →Select source column and Domain →Click on Next.
Mapping
  • Click on Start →It will start to perform a cleansing on the selected data source.
Perform Cleansing
  • It shows the result →In the new there are 5 columns.
OUTPUT
  • If data confidence is 100% then we can approve the data so it can go into the corrected data.
OUTPUT
  • If data is invalid then in the correct to fields we can correct the data and then click on approve.
OUTPUT
  • Finally, all data is a load to the destination for this →Give the destination type →Browse the path →and click on Export →Finish.
Browse the DESTINATION
  • We can see that good data is generated in the destination file.
Destination File
  • If we want to change this then we can able to like ProductName in the screenshot →Click on ProductName.
Data Quality Project
  • SSIS provides a transformation component DQS Cleansing. With the help of this, we can create good data.
SSIS Component
  • Right-click on DQS Cleansing →It opens the DQS Cleansing Transformation Editor →Click on Mapping →Choose the column name →Select the Domain.
DQS Cleansing Transformation Editor
  • IN THE DQS Cleansing Transformation Editor →Go to the connection manager →select the Data Quality connection manager and → Data Quality Knowledge Base →Click on OK.
DQS Cleansing Transformation Editor
  • Click on Start →It Execute the package →We can see the output in the destination file with the good data.
Package Execution

If you are new to MSBI start with the following must-watch video: -

--

--