Oracle Data Quality Review [ 1 ]

Ahlem Mustapha
6 min readJul 19, 2016

--

Azul ⵣ

As part of my New Year’s resolutions for 2016, I decided to write reviews about new tools and technologies that I use, as well as the books I read. The first tool on my review list is Oracle Enterprise Data Quality (EDQ), a data quality and data governance tool. EDQ is a part of Oracle’s family of products that helps address data quality issues such as deduplication, address verification, and matching. Overall, EDQ is a valuable tool for anyone looking to improve the quality of their data.

As BI consultants, much of our time is dedicated to understanding and preparing data. We spend a significant amount of time checking the reliability, completeness, validity, exactness, coherence, standardization, and duplication of data. Our primary goal is not only to transform business requirements into technical architecture, ETL, and dashboards, but also to ensure that our customers are using reliable data and gaining valuable insights from the large volumes of data they own. That’s why we recommend using a data quality tool, such as Oracle Enterprise Data Quality (EDQ), if one is not already in use within the company. Before diving into the details, it’s important to consider the following points:

It’s not worth investing time, effort, and money into building data warehouses and creating dashboards if the data they contain is flawed.

This is because inaccurate, invalid, inconsistent, and unreliable data can lead to poor decision-making and negative consequences for the business. However, this may not be a concern for all businesses, as investing in a reliable data quality solution can be costly and may not be necessary depending on factors such as the size of the business, the amount of data, and revenue. Many ETL tools also have limited capabilities for ensuring data quality, so it may be necessary for businesses to invest in a dedicated data quality tool, particularly with the increasing volume of data. Some companies, such as Oracle, Informatica, and Talend, offer robust data quality solutions. For example, Oracle’s ODI ETL tool includes EDQ cleanup capabilities that can be integrated into integration processes.

If your company handles a large amount of customer relationship management (CRM) data, it may be worthwhile to invest in a data quality solution. Not only can it lead to future financial gains, but it is also essential to ensure that the data displayed in dashboards is reliable and accurate. While visually appealing and functional dashboards are important, it is equally important to ensure that the data they present is trustworthy. This is especially true considering that poor data quality can lead to incorrect decision-making and negative consequences for the business. Gartner’s recent studies also emphasize the importance of data quality, as it is essential for businesses to have reliable data that is free from errors and inconsistencies “Poor data quality is the number one reason why 40% of all business initiatives fail to meet their goals. Only 30% of BI/DW implementations are fully successful”.

There are two main reasons for data warehouse project failure: budget constraints and data quality. In fact, over 50% of these projects will either have limited acceptance or fail altogether due to a lack of focus on data quality. By 2016, it is estimated that 25% of organizations using consumer data will suffer reputational damage due to misunderstandings related to information issues. To illustrate the importance of data quality, consider the following scenario: imagine you have a list of loyal customers and you want to send them a direct mail wishing them a “Happy Easter.” However, some of the addresses on the list are incomplete or missing postal codes. As a result, the mail never reaches its destination, leading to missed opportunities to engage with your customers and a wasted budget on delivery costs. Ensuring the quality of your data is crucial in avoiding such issues.

Duplicate emails in your database can be a major issue. It’s not just about having duplicate fields, but also about understanding whether those duplicates actually represent errors in your system or are genuine duplicates. For example, consider a situation where an HR person registers all employees with the same email address for an important event. Your traditional system may flag these as duplicates, but in reality, they represent different customers or visitors. If you don’t take the time to properly verify the data, you may end up discarding a list of valuable customers without realizing it. It’s essential to carefully analyze and understand the nature of any duplicate entries in your database to avoid making costly mistakes.

Enterprise Data Quality (EDQ) tools can help you detect and address issues with duplicate fields in your database. They allow you to decide how to handle these fields, such as whether to keep them or contact the relevant parties to update the information. EDQ can also help you assign a priority score to your customers, which can be useful for the goals of your marketing or finance team. By ensuring the accuracy of your delivery addresses and avoiding spamming your customers with multiple emails due to duplicates, you can maintain a good relationship with your customers and avoid being labeled as a spammer.

Enterprise Data Quality (EDQ) is a tool that provides a range of components to help you manage and improve the quality of your data. These components include:

Profiling: This feature helps you uncover and quantify hidden data issues.

Audit: Audit rules allow you to measure the quality of your data against your business rules and track its state over time.

Analysis and standardization: EDQ provides a user-friendly interface that enables business and IT teams to work together on data quality projects. This includes tools for analyzing and standardizing data to ensure it meets your business needs.

Overall, EDQ is a powerful tool that can help you improve the accuracy and reliability of your data, leading to better decision-making and business outcomes.

Parsing and standardization are important tasks that help you transform and normalize data to ensure it is consistent and reliable. This can include tasks such as transforming names, addresses, dates, and phone numbers into a standardized format, extracting structured information from free text, and preparing data to optimize its value for business applications. By using tools and techniques for parsing and standardization, you can improve the quality and usefulness of your data, leading to better decision-making and business outcomes.

Match and merge is a process that allows you to identify and merge records that represent the same individual, group, or household. This can be useful for both individuals and corporate entities. Match and merge tools often provide fully flexible rules that can be easily customized to fit your business needs, using pre-built templates. By using match and merge, you can improve the accuracy and completeness of your data, leading to better decision-making and business outcomes.

Address verification is a process that helps you confirm the accuracy and completeness of addresses in your database. This can be especially useful for adding geo codes to city or postal codes for over 246 countries. By verifying addresses, you can improve the accuracy of your data and ensure that your mailings and other communications reach their intended recipients. This can lead to better customer relationships and improved business outcomes.

Using Enterprise Data Quality (EDQ), we can create a simple process to profile and assess the health of our data. A screenshot of a sample process using open data from the internet is shown below. This process can help us identify any issues with the data and take appropriate action to improve its quality and reliability. By using EDQ and following a structured process for data profiling and assessment, we can ensure that our data is accurate and useful for our business needs.

EDQ example

This process involves several steps to improve the quality and usefulness of our data. These steps include email de-duplication, gender detection, data completeness analysis, data transformation (such as filling in names, last names, and salutation fields), postcode verification, and data output to various sources for review and analysis. By following this process, we can identify and address any issues with our data, resulting in a cleaner and more reliable dataset that can support our business needs.

Data quality tools have become increasingly useful in recent times due to their ability to verify data in real-time. EDQ, in particular, stands out as it not only performs real-time verification, but also provides dashboards to monitor the health and progress of your data over time.

Stay tuned for future articles to learn more about the capabilities of EDQ and other data quality tools

--

--

Ahlem Mustapha

Solution Engineer ❤️Graph Dbs, system design, DB architecture and more. I love data, shoes, and traveling.