Six Data Quality Dimension

Wanda Kinasih
2 min readOct 3, 2018

--

An effective data reporting process should not only provide timely performance data, but also high quality data. Designing such a process is indeed a challenge in itself, and establishing data quality factors is also a demanding task.

How to assess data quality? Below is general data quality dimension commonly used for measure reliability of data.

Source: https://www.whitepapers.em360tech.com/wp-content/files_mf/1407250286DAMAUKDQDimensionsWhitePaperR37.pdf

Completeness

Data completeness refers to whether all available data is present. When data is due to unavailability, this does not represent a lack of completeness. This can be measured by percentage of null values in certain fields. For example customer details repository consists in name, surname, address and email. However, data for surname is missing in more than one client, even if this information should be available in real world.

Validity

Data should follow certain format according to the desired data profile. Data are valid if conforms to the syntax (format, type, range) of its definition in the metadata or documentation. For example vehicle plate numbers in Indonesia should follow [Area Code][Numbers][2–3 digits alphabet] format. Data validity is related to data null-ness assessment in data completeness dimension.

Accuracy

Data should correctly describes the real world object or event that described in database. Validity is a related dimension because, in order to be accurate, values must be valid, have right value and in the correct representation. For example customer address data is considered accurate if it represents real location of customers’ home.

Consistency

Data should be consistent between datasets. Assessment of consistency can be done across multiple datasets and/or assessment of values or formats across data items, records, data sets and databases. This include people based, automated, electronic or paper. For example terminated employee which have payroll active status. This represent inconsistency between employee and payroll data.

Uniqueness

Nothing should be recorded more than once. Each data recorded should be unique. This can be compared between data recorded in dataset and data in real world. For example customer “Wanda K” and “Wanda Kinasih” recorded as two different person in identity card database, although the name belong to the same person. Non-unique data can create ambiguity in representation of data summary.

Timeliness

Timeliness references whether information is available when it is expected and needed. This refers to the availability and accessibility of data in making business decisions. Timeliness of data depends on user expectation. For example, billing data should be delivered every 10.00 AM to third party, or a doctor should have updated data of their patient lab result before giving medical treatment.

Ideally all data dimension should be followed by data person. But some dimensions can be compromised, depends on business needs. For example data can be late reported in certain time limit while maintain the data completeness.

Source:

https://www.whitepapers.em360tech.com/wp-content/files_mf/1407250286DAMAUKDQDimensionsWhitePaperR37.pdf

https://www.performancemagazine.org/data-quality-dimensions-from-accuracy-to-uniqueness/

--

--

Wanda Kinasih

Analyzing everything, from business intelligence to human interest