Vivid Vigor — The V’s of Big Data

Thomas Liu
CISS AL Big Data
Published in
7 min readOct 25, 2022

Don’t mind the title; it’s not exactly inspired, and in the grand scope of this article, it contains no significant meaning. It is, in some ways, nothing but a mockery of the tendencies of the eccentric, successful, and — dare I say it — weird people in STEM fields to give terms strange names.

I’m looking at you, Elon Musk. I hope X Æ A-12 doesn’t hate you when he grows up, though I don’t doubt that he will. And you, Murray Gell-Mann, for turning a groundbreaking discovery of quantum physics (quarks, for the uninitiated) into a childish perversion of science. And even you, IBM, the so-called most traditional of all STEM companies, for so shamelessly trapping the wondrous world of big data into one single letter of the English alphabet instead of twenty-six.

V. That is the letter that IBM decided to constrain all big data into back in 2011. A letter that is, for better or worse, worth 4 points in the game of Scrabble because of how unused it is in the English lexicon. Did you know? Even though the letter V accounts for 3.8% of all English letters, less than 1% of all words in English begin with it. But for a reason as superficial as alliteration, it has now become the nexus: the focal point of big data that defines and shapes the entirety of the subject.

Figure 1: Big data is rapidly growing in a world where information is at an all-time high.(https://www.iiba.org/globalassets/iiba-analyst-catalyst/images/here-is-some-information-about-what-a-big-data-analyst-is-and-does-social.jpg)

They are known simply as the V’s of big data, and they are a set of words (beginning with the letter V, of course) that are meant to define everything that characterizes big data: the collection, organization, and analysis of large amounts of information. At the most basic level, this set of words is limited to just 5 words that are at the very least somewhat reasonable. Of course, some slightly more adventurous (and questionable) lists can stretch that number up to 56, many of which aren’t even actual English words and even fewer of which are remotely related to big data. But we’ll just stick with the original 5 for now.

We begin with the strangest of the 5 V’s, one that you’d be forgiven for questioning given its contradictory nature: variety. As in, the variety of data that is collected. Indeed, in all forms of data collection, variety is not something to be desired. Heterogeneity is inherently messy, as it prevents the data from being sorted in exactly the same way over and over again. And yet, variety remains an important — if enigmatic — facet of big data. Why? Because big data prides itself on its adaptability and ability to embrace diversity. Certainly, processing the data itself would require more manpower and effort, but in exchange, it significantly lowers the resources needed to obtain the data in the first place; and one must surely think that is a worthwhile exchange.

Secondly, we have velocity, perhaps the most difficult of the 5 V’s to understand. After all, speed is not something that bodes well for data collection; rushing it often causes error. It seems velocity is yet another seemingly contradictory term, but if you would listen, I will explain. In essence, velocity does not have so much to do with big data itself as it does with its applications. Indeed, the methods through which data is collected in big data offers a unique advantage: by grabbing readily available data off databases and the internet, there is no need to create the data itself. There is only the need to organize it and then to interpret it. And therefore, it is much, much faster, and, by extension, well-suited to tackling very time-sensitive problems. We can observe this best through Google’s innovative solution to combating the H1N1 pandemic of 2009: by simply utilizing their vast database of search results related to the virus’s symptoms and treatments, as well as their locations and timestamps, Google was able to craft a dynamic, shockingly accurate map of the virus’s impact zones in close to no time, allowing health authorities all over the world to respond almost immediately. Compared to the more conventional methods employed by health organizations like the WHO (involving testing, sampling, and data analysis), Google was able to save several weeks of time in pinpointing the spread of H1N1 — an incredibly valuable amount of time for something as volatile as a virus. Through this example along with many others, we can be assured that the velocity of big data is not to be understated.

Moving on, we have veracity. This, too, is not an easy V to understand, not the least because many people do not even know what the word means. For the uninformed, veracity means conformity to facts; it means accuracy. This is perhaps the most obvious of the 5 V’s to recognize on a surface level, for data collection has always been worthless without veracity. But let it be known that it goes beyond that. Big data possesses abilities of incredible power in that veracity need not be the greatest priority. And indeed this statement seems so, so suspect: what use is data with no accuracy? But that is not what I mean. There certainly has always been a demand for veracity in big data; but not so much that it constrains our samples to very carefully procured tests and results. Within that, big data sees a great boon, for it allows it a rare flexibility with data collection that vastly increases the amount of data that is useful. In that, it is very similar to variety.

Then we have value, what most would consider the easiest of the 5 V’s to understand. Value is the simplest of the V’s, and it exists for one reason: to underline the importance that data holds. Surely this is not so difficult to recognize. Data has seen incredible value long before the rise of big data itself; it is the foundation upon which statistics and big data as a whole was founded. One simply cannot understate the value of data, and I do not think anyone would be foolish enough to do so anyway. And thus on this matter I feel I need not say much more.

And lastly, we have the final of the 5 V’s, and perhaps the most rudimentary, primal term of the lot: volume. Pure, unadulterated volume, unrestricted by processing power or sample size. Big data aims to use all data that could possibly be available — n = all, if you will — and it is in the sheer size of data that big data can process that its greatest powers lie. The volume of big data underpins the other V’s; for it is only because we need volume that we embrace variety and veracity, and it is only because of volume that we can enjoy velocity. Indeed, have we not stopped to consider the possible pitfalls of variety, veracity and velocity? They are all very much imperfect concepts; they lend themselves well to a lack of uniformity and accuracy, and in data science these weaknesses are often fatal. That is why volume must exist, for it possesses the incredible ability to overwhelm these weaknesses with nothing but its unimaginable magnitude. It is why big data is called as such, for all of its credibility is hinged on the number of plot points it produces. Volume is the bread and butter of big data, as well as statistics as a whole, for it only grows more powerful with time as exabytes upon exabytes of data relinquish themselves to its grandeur. It is what we turn to when problems need to be solved. It is what allows our newest innovations in technology to function. It is what gives us certainty in a world direly lacking in such. And that is why, in my humble opinion at least, that volume is the most important of the 5 V’s.

Figure 2: The 5 V’s of big data serve as an overview of all that this nascent field encompasses. (https://iclerisy.com/wp-content/uploads/2019/09/Big-Data.png)

So there you have it. The 5 V’s of big data — so foolishly coined yet so charmingly christened — summarized and explained in all their majesty and wonder. I’m quite certain you already know which one I favor, though all of the other four deserve all respect and reverence. Big data is truly a force to be reckoned with, one that will no doubt continue to flourish in the future. I suppose it is something to be excited about after all, though I surmise that anything I say has little sway with you. So think of it what you will, dear reader; I can only pray that you do so with a more vivid vigor than I.

References

Jain, Anil. “The 5 V’s of Big Data.” Watson Health Perspectives, IBM, 17 Sept. 2016, https://www.ibm.com/blogs/watson-health/the-5-vs-of-big-data/.

Lerman, Rachel. “Elon Musk’s Baby Name Isn’t Just Weird, It May Be against California Regulations.” The Washington Post, WP Company, 8 May 2020, https://www.washingtonpost.com/technology/2020/05/08/musk-grimes-baby-name/.

Mayer-Schonberger, Viktor, and Kenneth Cukier. Big Data: The Essential Guide to Work, Life and Learning in the Age of Insight. John Murray, 2017.

“The Naming of Quarks…” The Particle Adventure | What Is the World Made of? | The Naming of Quarks, https://particleadventure.org/quarknaming.html.

Panimalar, Arockia, et al. “The 17 V’s of Big Data.” International Research Journal of Engineering and Technology, Sept. 2017, https://www.irjet.net/archives/V4/i9/IRJET-V4I957.pdf.

--

--