The “V” Not for Vendetta but Big Data

Daniel Wan
CISS AL Big Data
Published in
3 min readDec 18, 2021

The age of technology has deeply embedded the sense of data in our lives. We produce data almost every single millisecond, and we often don’t notice that data is just flowing around us. According to DOMO, a person generates about 1.7 megabytes of data per second. We might feel like 1.7 MB is not a considerable amount of data. However, compared to the first iPhone, 128 MB memory, we can fill the memory in less than 2 minutes. Moreover, we can fill the 16-gigabyte storage in approximately a week.

(Figure 1, Andre, Louie. “53 Important Statistics about How Much Data Is Created Every Day.” Financesonline.com, FinancesOnline.com, 15 June 2021, https://financesonline.com/how-much-data-is-created-every-day/.)

How does all this information relate to “V”? What is a “V”? There are five key V’s in the field of data science: Volume, velocity, variety, veracity, and value. There are also multiple other V’s defined by data scientists. But the five V’s mentioned above are the ones that constitute data science.

(Figure 2, Salzig, Christoph. “What Is Big Data? — a Definition with Five Vs.” What Is Big Data? — A Definition with Five Vs, https://blog.unbelievable-machine.com/en/what-is-big-data-definition-five-vs.)

In my opinion, the essential V is volume. Compared with other “V” s, such as velocity and variety. The velocity, by definition, refers to the high speed of collecting data. The velocity can make data “flow,” which can meet the demand for the speed of generating and processing data. The variety in big data refers to the three data structures: structured, semi-structured, and unstructured. The variety of data affects its organization while processing. Compared to the speed and structure, the amount of data (volume) is relatively more important than the others. Some may even argue that Google’s velocity of data was the main contributor to their analysis of diseases. However, I would say that without a large sample that generated the trendline, it would be impossible for Google (See Figure 1) to predict the trend. Even if you have speed or types, smaller data samples won’t contribute to the confidence level of data prediction. Similarly, value and veracity might be necessary, but they’re relatively less important than the volume. Value can be analyzed, so it’s not needed at the beginning of sampling. Veracity, as mentioned in the book Big Data: A Revolution That Will Transform How We Live, Work, and Think by Kenneth Cukier and Viktor Mayer-Schönberger, is not as important as volume. Accurate but little information might not be enough to predict the trend accurately.

As the field of data science keeps advancing, the prominence of big data will become more evident. In conclusion, everyone’s opinion on the most important V might differ. In general, all the factors are crucial for big data. Without any of the Vs, data science would not be as impactful today.

(Figure 3, (“Google Flu Trends.” Wikipedia, Wikimedia Foundation, 31 Aug. 2021, https://en.wikipedia.org/wiki/Google_Flu_Trends.)

Sources:

utilizar, No. “The Five V’s of Big Data: BBVA.” NEWS BBVA, BBVA, 26 May 2020, https://www.bbva.com/en/five-vs-big-data/.

“5 V’s of Big Data.” GeeksforGeeks, 10 Jan. 2019, https://www.geeksforgeeks.org/5-vs-of-big-data/.

--

--