How Volume Rules Big Data

Daniel
CISS AL Big Data
Published in
5 min readOct 25, 2022

Why exactly is Big data called Big data? The current consensus is that big data simply works with large data sets.

When you think about the word Volume, you perhaps thought about what you were taught in middle school math class: the amount of space that is occupied by a 3D object. Well, in Big Data, volume refers to something very different. Volume in Big Data refers to how many data points you have. (Figure 1.)

Figure 1. The three main V’s (https://bigdataldn.com/news/big-data-the-3-vs-explained/)

There are three (or five) large Vs of big data that most of Big data is based on. These are Volume, Velocity, and Variety. HOWEVER, with the advancement and progression of Big data, more than 56 Vs have been created. Some examples of these are Vagueness, Valor, Vanilla, and many more that have shaped Big data analytics. (Figure 2.)

Figure 2. The expanded list of V’s (https://www.scirp.org/journal/paperinformation.aspx?paperid=103823)

However, one current problem with this expanded list of V’s is that there are just too many V’s to keep track of. With the excess of 56 V’s, each V just is not as important. In other words, the 3–5 original V’s that set the groundwork for all of Big data analytics are being watered down by 50 other V’s that only exist because they are words that start with the letter V.

The most emphasis, instead, should be put into these most 3–5 most important V’s that have frankly laid the groundwork for the development of Big data.

The sheer volume of data globally

It is important to show just how much the volume of data has increased in the past couple decades. In 2010, two zetabytes of data were consumed, transferred, and used worldwide. To put that into perspective, the typical hard drive has approximately 5 terabytes of storage. Two zettabytes is around four hundred million of these hard drives. That may sound like a lot, but that was the entire internet traffic of the internet in 2010. Now, compare that to 2020. In 2020, approximately 64.2 zettabytes of data were consumed worldwide. That is an increase by over 30 times. In 2025, society is expected to consume 181 zettabytes, that is a 90x increase. That is over 36 billion of our typical five terabyte hard drives. (Figure 3.)

Figure 3. Estimated Data Consumption per Year (https://www.statista.com/statistics/871513/worldwide-data-created/)

Volume essentially refers to the amount of data that exists. With more data points, better models that more accurately represent the data are easier to create. Why exactly is data called the new oil? Why exactly is data one of the most important resources of the information era?

For example, if we were trying to come up with an algorithm that could predict the success of students on tests given a certain number of factors, more data would make it much easier to predict the success more accurately. We can think of two situations: one where someone has 100 data points, and one where there are 1,000,000 data points. With the 1,000,000 data points, trends and algorithms that more accurately predict are easier to create because data scientists have a better picture of the overall trend.

Applications of Volume in the real world

How is data Volume employed by companies nowadays? You may be familiar with reCAPTCHA tests that tests if “you are not a robot”. To the average user, it may seem like a random test with grainy photos of street features. To Google, however, this reCAPTCHA test helps train their machine learning algorithm for self-driving cars. (Figure 4.)

Figure 4. An example of a reCAPTCHA test (Google.com)

In fact, Google does not even classify the pictures themselves; instead, they offload the work to internet users to classify random pictures for them. The sheer volume of data generated by tens of millions of internet users is one of the most effective tools used in Big data to create accurate models.

Facebook also uses huge volumes of data to help train their facial recognition algorithms. Facebook hosts over 250 billion pictures, so using all these data points in these hundreds of billions of pictures helped Facebook to create some of the most sophisticated facial recognition algorithms using Volume and big data. (Figure 5.)

Figure 5. Facebook Facial Recognition at Work (https://www.npr.org/sections/thetwo-way/2017/12/19/571954455/facebook-expands-use-of-facial-recognition-to-id-users-in-photos)

This facial recognition software, using Big data and the Volume of data, helps Facebook to automatically tag people in photos based off their facial features. Without this large Volume of data and the use of Big data, Facebook’s facial recognition algorithm would not be nearly as successful and widely used.

References:

Total data volume worldwide 2010–2025 | Statista. (2020). Statista; Statista. https://www.statista.com/statistics/871513/worldwide-data-created/

‌ Gillis, A. S. (2021). 5 V’s of big data. SearchDataManagement; TechTarget. https://www.techtarget.com/searchdatamanagement/definition/5-Vs-of-big-data#:~:text=Volume%20is%20like%20the%20base,can%20be%20considered%20big%20data.

‌ Media, O. (2012, March 11). Volume, Velocity, Variety: What You Need to Know About Big Data. Forbes. https://www.forbes.com/sites/oreillymedia/2012/01/19/volume-velocity-variety-what-you-need-to-know-about-big-data/?sh=6647be51b6d2

Gewirtz, D. (2018, March 21). Volume, velocity, and variety: Understanding the three V’s of big data. ZDNET; ZDNET. https://www.zdnet.com/article/volume-velocity-and-variety-understanding-the-three-vs-of-big-data/

The 42 V’s of Big Data and Data Science | Elder Research. (2020, November 11). Elder Research. https://www.elderresearch.com/blog/the-42-vs-of-big-data-and-data-science/

The 42 V’s of Big Data and Data Science | Elder Research. (2020, November 11). Elder Research. https://www.elderresearch.com/blog/the-42-vs-of-big-data-and-data-science/

Is reCaptcha Training Robocars? (2018, May 31). Ceros Inspire: Create, Share, Inspire. https://www.ceros.com/inspire/originals/recaptcha-waymo-future-of-self-driving-cars/#:~:text=Google%20does%20use%20reCaptcha%20to,an%20empty%20patch%20of%20asphalt.

Facebook Expands Use Of Facial Recognition To ID Users In Photos. (2017, December 19). NPR.org. https://www.npr.org/sections/thetwo-way/2017/12/19/571954455/facebook-expands-use-of-facial-recognition-to-id-users-in-photos

--

--