VVhen VVill it Stop? — The Expansion of Big Data’s V’s

Ellie Wang
CISS AL Big Data
Published in
7 min readOct 25, 2022

From Viability to Voodoo, Volume to Vanilla, it seems that practically any word beginning with a “V” is being added to the illustrious laundry list of Vocabulary used to describe properties of big data, as shown in Figure 1.

Figure 1: Asay, M. (2022). [Digitally Altered Photograph]. InfoWorld. https://images.idgesg.net/images/idge/imported/imageapi/2022/08/08/10/data_streams_through_a_businessmans_head_mindset_thought_thinking_analysis_strategy_process_analytics_intelligence_imagination_creativity_by_metamorworks_gettyimages-1256604404_2400x1600-100854914-medium.3x2-100931058-medium.3x2.jpg?auto=webp&quality=85,70

These words (all beginning with the letter “V” because data scientists are quirky) are lovingly grouped together into one big happy family and dubbed the “V’s” of big data. However, many data scientists will disagree on the size of this family — just how many V’s truly exist and should be recognized. The most popular conjectures range from 3 to 42 V’s with 4, 5, 7, 10, and 17 especially resonating with many and 1 “C” sometimes jumping into the mix. I’m certainly not qualified to question the Validity of the V’s, but I can say with confidence that words such as Varmint, Victual, and Vulpine being used to describe big data make my brain do a doubletake.

In its earlier years, big data characterization began with only 3 V’s: Volume, Velocity, and Variety. Since then, researchers have developed a voracious appetite for more descriptors — more V’s — and they never seem satisfied with what they’ve got already. But here’s the catch: as more and more V’s are being invented, the weight and integrity of each individual V decreases. It can sometimes seem like each new V isn’t well thought out or is only added on for the sake of expanding the list. These new and oftentimes confusing V’s discredit and weaken the strong, reliable platform that the V’s of the past were built upon.

For the sake of your time and my sanity, let’s talk about the 5 arguably most legitimate V’s in the big data field.

1. Volume

I’m not talking books, sound, or how much space an object takes up, as shown in Figure 2. Volume in the context of big data refers to the quantity or size of the data set collected. Because volume measures how much data there is, it can allow data analysts to conclude whether or not that base amount of data is large enough to be considered big data. Commercial chains and retail giants like Walmart constantly deal with volumes of data that fall into the big data classification. From Walmart’s 1 million customer transactions every hour, more than 2.5 petabytes of data are collected and imported into Walmart’s database. To put that in perspective, that’s about 167 times the volume of information stored in every book in the US Library of Congress combined.

Figure 2: Rogerson, K. (2020). [Graphic]. Comm100. https://www.comm100.com/wp-content/uploads/2020/03/image/png/Comm100_BlogImage_ChatVolume.png

2. Velocity

Nope, this isn’t distance over time with a direction, as shown in Figure 3. In the realm of big data, velocity refers to the speed at which data is both accumulated and transported. Data is constantly on the move. From phones to computers to machines to networks, so many steps and stops exist in the flow of big data. Because there’s so much data and the flow of it never stops, how fast that data can be generated and move is crucial information for those who wish to process and glean information from it. With how quickly the world moves today, it’s crucial for data to move faster, especially for big companies like Amazon that capture every click of a consumer’s mouse as they visit the website. Multiply the number of clicks by the average traffic on Amazon’s website, and that’s an enormous amount of data (big data, in fact) moving at a rapid pace and constantly being collected.

Figure 3: Joshi, N. (2017). [Digitally Altered Photograph]. Allerin. http://www.allerin.com/wp-blog/wp-content/uploads/2017/04/Understanding-fast-data-in-big-data-1.jpg

3. Variety

Data has three main forms: structured, unstructured, and semi-structured, as shown in Figure 4. Structured data refers to any data that can be stored in a fixed format. 0s and 1s are oftentimes the language of structured data. Examples of structured data include Excel files and SQL databases. Unstructured data refers to any data with an unknown form. For instance, a data set that includes images, text, and videos could be one unstructured data source. In the words of Dr. Peter Tong from Concordia International School Shanghai, unstructured data can be “kind of a pain.” However, with the development of artificial intelligence, unstructured data is becoming easier to deal with as new algorithms are invented and the AI trains itself to become better using preexisting sets of big data that are only growing larger with more user inputs. Lastly, semi-structured data are those that are structured in form but are not actually defined. In simpler terms, these data do not fall into either the structured or unstructured category. XML files would be an example of semi-structured data.

Figure 4: Barker, I. (2016). [Digitally Altered Photograph]. BetaNews. https://betanews.com/wp-content/uploads/2013/12/Data.jpg

4. Veracity

Data veracity is less rigidly defined than the other V’s. In short, it refers to factors including the accuracy, quality, consistency, and trustworthiness of big data. Big data, because it’s so large in quantity, it’s bound to be messy, difficult to manipulate, and extra difficult to filter out the “bad data,” but at the same time, more data can paint a more comprehensive picture and reveal trends that sampling simply could never hope to do, as shown in Figure 5. This tradeoff is a core concept of big data, and the veracity of a data set could help an analyst illustrate and understand this concept.

Figure 5: Kundu, S. (2013). [Digitally Altered Photograph]. It Next. https://www.itnext.in/sites/default/files/styles/article_image/public/dataveracity.jpg?itok=B3h_qsVL

5. Value

Last but not least (because it’s quite VALUE-able… get it?), any data collected, no matter how much, only matters if it carries value to the person wishing to use it. Let’s say you’re a business owner, and you own a chain of ice cream shops across Sweden. Data, perhaps even big data, about the Swedish population’s flavor preferences or consumption habits, would be quite valuable to you and your business. However, data about the Swedish people’s religious leanings or sexual orientation probably wouldn’t be as valuable or even relevant. Additionally, data itself does not inherently have value. Value comes from the information extracted and interpreted from the data set — the connections made, correlations found, and trends explored, as shown in Figure 6.

Figure 6: Akred, J., & Samani, A. (2018). [Graphic]. MIT Sloan Management Review. https://sloanreview.mit.edu/wp-content/uploads/2018/01/DA-Akred-Data-Value-Worth-Privacy-Security-1200-1200x630.jpg

All these V’s are equally important to individuals and firms wishing to process and utilize data. Or are they? To answer that question, you and I must make a Value judgment (is this foreshadowing? Perhaps it is). In my world, which only involves data analysis every other day during my last period of class, I have a clear winner. Does my opinion matter in the grand scheme of things? No, not at all. Am I still entitled to it? Yes, even though I’m not any sort of expert in the field of big data, and I’m definitely not claiming to be. Take my opinion with as many grains of salt as you want, perhaps even the whole shaker.

That said, value, the last V, is the most important V of them all. I think of it this way: if data has no value to an individual or firm, would it even be collected in the first place? No, sir (or ma’am or whatever title you prefer), you bet it wouldn’t be. The whole field of data analytics, big data, and all of that wouldn’t even exist. Human beings don’t do things unless they believe their actions will have some sort of benefit. I don’t mean to get existential, but nothing in this world is rationally done without purpose. Data scientists don’t just look at data all day because they think it’s fun. They look at data to better shape public policy, to improve quality of education, to make scientific breakthroughs, and even to save lives.

References

Concordia International School Shanghai. (2021). Module 1: Introduction to big data. Concordia International School Shanghai AL Big Data. https://www.cissbigdata.org/module-1-introduction-to-big-data

Firican, G. (2017, February 8). The 10 V’s of big data. Transforming Data with Intelligence. https://tdwi.org/articles/2017/02/08/10-vs-of-big-data.aspx

FutureLearn. (2013, August 8). What does big data mean? https://www.futurelearn.com/info/courses/applied-big-data-analytics/0/steps/52404

Gillis, A. S. (2021, March 24). The 5 V’s of big data. SearchDataManagement. https://www.techtarget.com/searchdatamanagement/definition/5-Vs-of-big-data

Impact.com Team. (2020, November 3). The 7 V’s of big data. impact.com. https://impact.com/marketing-intelligence/7-vs-big-data/

Middelburg, J. W. (2019, March 12). The four V’s of big data. Enterprise Big Data Framework©. https://www.bigdataframework.org/the-four-vs-of-big-data/

Panimalar, A., Shree, V., & Kathrine, V. (2017, September). The 17 V’s of big data. IRJET- International Research Journal of Engineering and Technology. https://www.irjet.net/archives/V4/i9/IRJET-V4I957.pdf

Shafer, T. (2017, April 1). The 42 V’s of big data and data science. Elder Research. https://www.elderresearch.com/blog/the-42-vs-of-big-data-and-data-science/

--

--