The Importance of Veracity in Big Data

Daniel Zhao
CISS AL Big Data
Published in
4 min readOct 25, 2022
Figure 1: Ward, C. (2018, August 14). Why experience data is the new currency for big business. MyCustomer. Retrieved October 25, 2022, from https://www.mycustomer.com/experience/voice-of-the-customer/why-experience-data-is-the-new-currency-for-big-business

Data is currently one of the most important resources in the world. Like figure 1 shows, filtering and manipulating data allows companies to make big decisions that affect millions of people and earn billions of dollars. But, how do people derive usable information from such an important resource? The answer, Big Data.

What is Big Data?

Put simply, big data consists of large, complex sets of data that cannot be processed without the use of specialized software, and is qualified by a set of characteristics known as the V’s. Over the years, the number of V’s attributed to big data has increased from 3 to almost 60, ranging from variability to vogue, but the most well-known and generally regarded as the most important V’s are the first 5. These five are Volume, Velocity, Value, Veracity, and Variety.

Volume

The first V, volume, is simply the quantity of data. The number of users on an app, the amount of donuts in a box, the quantity of students in a school. Though it sounds obvious, it is very important. After all, big data only becomes big data after the data set is large enough. But that’s not the only reason Volume is one of the V’s. Having larger volumes of data also means you don’t need to be as accurate with the data, as the sheer amount of data will compensate for it slightly.

Velocity

Secondly, velocity is the speed at which data is generated and transferred. How quickly someone can write and send an email, how fast someone can pick up the phone. The quicker companies get data, the quicker they can take it into account for business decisions. If the velocity isn’t high enough, a company might not have the most recent up-to-date data they could, which might lead to a bad decision being made.

Value

Value refers to how useful the data is and what can be done with it. Similar to how a $100 bill is worth 10000x more than a penny, data can have differing levels of value. The value of big data is relative to what the company needs, and if the value is low then trying to pull insight from it might be a waste of time, the result not worth the investment they’d have to put in. Alternatively, even if the investment and time required isn’t high, if the information derived from the data would have no effect, it has no value.

Veracity

This V refers to how accurate the data is, and is tied to the other V’s. Like rumors being passed around a high school, if you base your opinions off of false, made-up data you will end up making decisions that aren’t relevant to the actual situation. Generally speaking the higher the volume, the higher the veracity. The higher the velocity, the more recent and up-to-date the data is, and the higher the veracity. Lastly, if you get insight from a data set with low veracity, there’s a higher chance that the value will be much lower.

Variety

The final V is variety, which is the diversity of data types (Structured, Unstructured, Semi-structured). The types of desserts in a bakery, the different phone models in a tech store. Depending on what the company needs, different data types might be a better fit for their needs. For example, if a company needs data that can be easily and effectively processed and analyzed, structured data might be best for them.

Importance of Veracity

Veracity is, in my opinion, the most important V, with value being second. The reason I picked Veracity over value, despite data being useless if the value is low, is because oftentimes value relies on veracity. After all, if the data is incomplete or has issues, the resulting data won’t be useful either, it won’t have value. It connects to the other V’s as well. No matter what variety of data types you use, if they don’t have veracity, it won’t matter because the results won’t be accurate enough to be useful. As for velocity and volume, having more data or transferring it faster doesn’t matter if that data isn’t correct. The important decisions that rely on big data can only have value if the data it is derived from is correct. Otherwise, making those decisions with faulty data will have huge consequences.

Figure 2: Greg Wright, E. V. P. and C. P. O. for the E. C. I. S. (C. I. S. business in N. A. (2021, March 4). Why accurate data is critical for economic stability. Banking Exchange. Retrieved October 25, 2022, from https://m.bankingexchange.com/news-feed/item/8589-why-accurate-data-is-critical-for-economic-stability

Overall, big data relies almost entirely on value, after all, why would it be so important if it was useless? However, the value of a data set is heavily reliant on its veracity. Just like archery, as shown in figure 2, the more accurate the shot, the more points you’ll get. One bullseye is worth ten shots on the outer ring. You need to make sure the sources are credible and the accuracy is high to make the most out of the data, or else no matter how much you analyze and process the data, the results will never be what you need.

--

--