Through the V’s of Big Data

Edward
CISS AL Big Data
Published in
8 min readDec 4, 2023

Introduction

A recently coined term and field is that of Big Data, seeking to take advantage of the massive amounts of data that can now be collected due to advances in technology to draw conclusions and make predictions. If you’ve seen any kind of computer recommendation (YouTube, Spotify, amazon, ads), you’ve directly witnessed Big Data at work, and it also works behind the scenes in many fields such as finance and healthcare (Mayer-Schönberger & Cukier, 2013). Big Data’s first definition came with three words, Volume, Velocity, and Variety, all representing major ways that Big Data differed from previous usages of data. However, 3V’s became 4, then 5, until eventually, we got 42 V’s (Shafer, 2017; Farooqi et al., 2019), 51 (Khan et al., 2019), and 56 as seen in Fig 1.

Fig. 1: Alist of 56 V’s, note the inclusion of 38. Vulpine is likely only a joke as it refers to foxes (Hussein, 2020).

The number of V’s is overwhelming, and many of them are only tangentially relevant or overlapping, so here is an attempt to group the V’s of big data so that the whole thing becomes approachable. As I mention more V’s, I’ll add them into a diagram, as seen in Fig. 2, that I hope will be more understandable and approachable than a massive list.

Fig 2: Initial diagram, made using Excalidraw.

The Initial Three— Velocity, Variety, and Volume

The first three V’s of Big Data were aptly chosen, their significance to Big Data has led to many other V’s having related meanings.

One is velocity, which at its core describes the rate data is generated, which has increased due to new sources such as IoT sensors, logs of transactions and web visits, video footage, and social media. In comparison, older attempts to collect data without sampling were limited by the ability to process it, for example, the US census is taken every decade and in the late 1800s the tabulation time approached 10 years (until being reduced by tabulation machines) (Mayer-Schönberger & Cukier, 2013). Many other V’s reference how “fast” Big Data is so fastness it will be a category. Volatility determines the timespan when data is useful. This can be from the data becoming outdated, for example rapidly changing sentiments in social media, or from needing to be archived/deleted (GutCheck, 2019). Virality is the rate data spreads from user to user. It can describe the rate of spread through networks like social media (if you are doing social media sentiment analysis for example) or other people-to-people networks (The Human Capital Hub, n.d.). Viscosity is the time it takes between an event occurring and the event being described. This is also sometimes used to refer to the difficulty of working with data (Shafer, 2017), which is tied to the processing-time based previous definition as data being more difficult to work with data would lead to more time being needed to accurately describe an event. Putting these into the visual yields Figure 3.

Fig 3: Diagram with V’s relating to velocity added.

Next is variety, which describes how more types of data are collected and used together. While working with data was originally constrained to structured data confined to a few database tables, Big Data attempts to use other sources that are more difficult to analyze. Other than traditional tabular data, this includes semi-structured data such as log files, and also completely unstructured data, images, videos, and free-form text. This data comes from many sources, or venues, including public networks like social media and search engines, private networks of IoT devices, and the user (genomics scans, actions on a website, etc.). When looking at individual data points there is still variation, expressed as variability, which indicates how the data can be very noisy, sometimes with wild values that are seemingly absurd or breaking the data format. An example of this would be describing a location, it could be “First Street”, “1st Street”, “First St.”, and so on. Adding these ways that Big Data can vary results in Figure 4.

Fig 4: Diagram with V’s from velocity and variability added.

Finally, there is volume, representing how the sheer amount of data being collected and processed is increasing. The amount of information stored can be described using thousands of exabytes (Mayer-Schönberger & Cukier, 2013), almost digital, which is astronomical compared to the data available historically. Hence when creating datasets there is much more that can be included. With these large datasets, an advantage of Big Data emerges, which is that it’s varifocal. With n = all, there is no need to worry about sampling errors when trying to examine specific subgroups of the data, and technological innovation has meant that working with unsampled data for more macro-level analysis is far more doable. Big Data tries to collect everything and can act on the massive amount of data collected making it valuable on both macro and micro levels.

Now with all 3 of the original V’s added, here is the diagram (figure 5).

Fig 5: The diagram with the 3 original V’s and ones immediately related.

Value

There would be no point to Big Data if it didn’t have some benefit to it, which would be its value. Several V’s have been chosen to describe what generally is enabled by big data, vantage, vane, vet, and voice (Shafer, 2017). At its core, Big Data involves the collection of data about as many things as possible, so that correlations can be drawn between these things, whether it’s photos of restaurant recipes or a survey of personal habits (Simonite, 2019; Mayer-Schönberger & Cukier, 2013). By knowing these correlations, what Big Data provides is a vantage point from which insights can be obtained on even very complex systems such as public transportation networks. The insight gotten can point people in the right direction, showing patterns of defective parts, procedures that increase user retention, etc., thus allowing Big Data to serve as a vane. Furthermore, these insights can be used to quickly vet possible explanations or hypotheses, determining whether existing systems or practices built on intuition and smaller-scale analysis are truly the best option.

Another way to appreciate what Big Data enables us to do is through the V voice. Consider any system that has undergone significant datafication and now is providing value through Big Data. In the process of analyzing, it is often said that you “let the data speak”. In effect, you are allowing these systems to speak to you through data, finally giving them a voice.

With these added the “What Does it Offer” category is complete as seen in Fig. 6.

Fig 6: Completed “What Does it Offer” category.

The Process

Lastly, there are V’s relating to details about the implementation of Big Data by organizations. With data becoming such an asset, cybersecurity becomes an important issue. Data breaches are quite common, and the complex workflows needed for Big Data present a big target. The V vault expresses that data should be secured from unauthorized access and tampering.

One challenge faced by organizations trying to use Big Data is how different processes are needed to make sense of the data. With so many data points of varied forms determining causation can be time-consuming or impossible. Instead, the collection of correlations must lead to some kind of understanding, this is complicated by spurious correlations which emerge randomly in large volumes of data even if the data is accurate. For example, a correlation can be found between revenue generated by arcades and computer science doctorates awarded, no meaningful causation exists between these obviously as seen in Fig 7. This can be described as Big Data being vague, giving only a bunch of correlations and no clear causes. Through a good vision (i.e. idealized goal state, blueprints/maps to follow, what needs to be achieved), the use of Big Data can be made more effective, with the ability to react to disrupts particularly improved. Instead of ad hoc and sometimes arbitrary affairs, projects can have meaningful goals that are steps toward fulfilling said vision, allowing Big Data to have long-term and sustainable benefits (Gupta & Gupta, 2015).

Fig 7: Correlation doesn’t imply causation.
Fig 8: Completed diagram.

Conclusion

The three main V’s combined with a few more such as value and veracity are practically all you need for a basic understanding of Big Data (or just describe it without any V’s). Still, I hope that these (figure 8 and the explanations in the article) help put the overwhelming amount of V’s thrown around in certain places into perspective, making the whole collection more approachable.

References

Farooqi, M., Shah, M., Wahid, A., Akhunzada, A., Khan, F., Amin, N., & Ihsan, A. (2019). Big Data in Healthcare: A Survey. https://doi.org/10.1007/978-3-319-96139-2_14

Gupta, U. G., & Gupta, A. (2015). Vision: A Missing Key Dimension in the 5V Big Data Framework. Journal of International Business Research and Marketing, 1(3), 40–47. https://doi.org/10.18775/jibrm.1849-8558.2015.13.3005

GutCheck. (2019, August 29). Veracity: The Most Important “V” of Big Data. GutCheck. https://gutcheckit.com/blog/veracity-big-data-v/

Hussein, A. A. (2020). Fifty-Six Big Data V’s Characteristics and Proposed Strategies to Overcome Security and Privacy Challenges (BD2). Journal of Information Security, 11(04), 304–328. https://doi.org/10.4236/jis.2020.114019

Khan, N., Naim, A., Hussain, M. R., Naveed, Q. N., Ahmad, N., & Qamar, S. (2019). The 51 V’s Of Big Data: Survey, Technologies, Characteristics, Opportunities, Issues and Challenges. Proceedings of the International Conference on Omni-Layer Intelligent Systems. https://doi.org/10.1145/3312614.3312623

Mayer-Schönberger, V., & Cukier, K. (2013). Big Data : a Revolution That Will Transform How We Live, Work and Think. John Murray.

Shafer, T. (2017, April 1). The 42 V’s of Big Data and Data Science. Elder Research. https://www.elderresearch.com/blog/the-42-vs-of-big-data-and-data-science/

Simonite, T. (2019). How Data Helps Deliver Your Dinner On Time — and Warm. Wired. https://www.wired.com/story/how-data-helps-deliver-your-dinner-on-time-warm/

The Human Capital Hub. (n.d.). Big data: Everything you need to know. Human Capital Hub. Retrieved November 27, 2023, from https://www.thehumancapitalhub.com/articles/Big-Data-Everything-You-Need-To-Know

--

--