What is Data Integrity?

邱如韻 Diana Chiu
Numbers Protocol
Published in
4 min readAug 25, 2020

Have you ever encountered these situations? The spreadsheet you finished was arbitrarily edited, or that the data you collected was deleted. With the big data era coming, data-related nouns, such as data security and database, are discussed more and more frequently. Especially, The definition of data integrity has been brought up countless times. Do you actually understand data integrity? How does it relate to changing the information of a database?

Generally speaking, the definition of data integrity is “During the product life cycle, the information in the database isn’t arbitrarily changed by unauthorized people. If it is changed, authorized people can discover instantly and amend it ”. On the other hand, data integrity can be used to describe the process as well, such as checking the errors and credibility of the data. In short, data integrity is a tool to protect the data to prevent malicious data attacks.

Does it sound clear so far? Then, let’s see what the standards of data integrity are.

Two standards of data integrity:

Accuracy: Ensure the information collected is aligned with the regulations of the database.

Reliability: Ensure the data are dependable and aren’t changed by unauthorized people.

After understanding two data integrity standards, let’s talk about the categories and applications of data integrity.

Categories of data integrity:

Entity Integrity:

Under the definition of the database, data information should be complete in each column. “Database” may sound terrifying. Let’s look at this concept from a daily example. When we see a photo on social media, we see not only the image but also the photo-shooting time, location, photographer, angel, etc. These are all significant assistant information. Without them, it is hard to distinguish whether the photo aligned with the description. We usually see that photos are stolen or used in the wrong context, which can form misinformation and be wide-spread on social media platforms. In this case, lacking data integrity plays a huge role.

Domain Integrity:

Via the whole process of filtering checking the accuracy of the information, the data are ensured to meet the regulations of the databases before stored. To put it simply, domain integrity means that information value follows the database settings. For instance, missing data are marked “N/A,” numbers are documented until the second decimal place, only storing values between 10 to 100, and so forth. We can imagine in this way. In a supermarket, meat can’t be put in the vegetable area; cucumber can’t be put on the spice shelf. Thus, the information needs to be in the RIGHT place.

Referential Integrity:

Referential Integrity means to ensure the cited data can be traced back to the original data. That is to say, in data B, which cites data A, all the citing information needs to be aligned with data A. If that sounds too abstract, the concept can also be explained by publication. When a publisher wants to publish a book, he needs to cite the author’s information. If the author himself doesn’t have his own background, college, accomplishment information, then the referential integrity doesn’t exist. Another real-life example is Numbers Capture. If someone cites photos verified by Numbers Capture, they can refer back to the original photographic information on the blockchain by the photo fingerprint. This perfectly demonstrates referential integrity.

User-defined Integrity:

As for how this category is called, it means the rules defined by users, which do not belong to any of the three categories above. Thus, data integrity isn’t confined to rigorous definitions but also the meanings endowed by data owners.

Data Security & Data Integrity

Lastly, let’s talk about the confusion between data security and data integrity. From the definition, data security is defined by whether the data is safe or stolen. Data integrity is defined by whether the data is intact or altered by unauthorized others. For example, governments usually have good data security but not data integrity. If you ask the government, “who am I?”, the response can be your birthday, id numbers, etc. Conversely, social media usually have good data integrity but not data security. If you ask Instagram, “who am I?”, you can get responses such as your interest, habits, and anything you care about the most. This is because the data integrity here performs better. However, data is easier to be hacked or stolen on social media platforms due to the lack of data security. Note that hackers steal your intact information; thus, data integrity is still there.

In this data-driven era, the ability to critique data becomes more and more important. How to maintain the quality of data is not only the responsibility of data owners but also ours. After knowing data integrity, you can be more aware while reading information. Next time, make sure you check data integrity before using them! :)

--

--

邱如韻 Diana Chiu
Numbers Protocol

Diana 的寫實派留學筆記。不只寫那些國外的憧憬,更寫那些深刻的顛沛流離, 一個出國卻喜歡用中文寫作的人。