The Central V’s in Big Data and Their Evolvement

Yiran Zhao
CISS AL Big Data
Published in
5 min readOct 9, 2023

Introduction to Big Data and the V’s

Big Data refers to vast and complicated sets of structured, semi-structured, and unstructured data that are generated from various sources at a rapid pace. It is hard to be effectively processed and analyzed by traditional database management tools. Overall, Big Data spans across multiple industries and sectors bringing them uncountable progress.

In Big Data, there are some factors (aspects) that capture its key characteristics. They were categorized as the “V” s because that their initials are all the letter V. The V’s highlight the challenges and opportunities associated with managing and extracting value from large and complex data sets. However, the V’s didn’t remain unchanged since they were first developed. Nowadays, there are various ways of classifying the V’s as more V’s have been found.

THREE initial V’s in Big Data

At the very beginning, an industry analyst Doung Laney came up with the idea of the three V’s of Big Data. He proposed that volume, velocity, and variety can be the three V’s that represent the mainstream definition of Big Data. Volume illustrates the size of data and the quantity of collected and stored data. Velocity represents the speed of data and the transfer rate of data between source and destination. Variety refers to the diverse types of data, different types of data like pictures, videos, and audio arrive at the receiving end. In general, the three V’s show the core characteristics of Big Data, as portrayed in Figure 1.

Figure 1: The three V’s of Big Data (Panimalar et al., 2017).

Evolving to the FOUR, FIVE, SIX, and SEVEN V’s

After having three V’s in Big Data, SAS (Statistical Analysis System) came up with one more V, variability, making up the four V’s. Variability is basically data differentiation. It encompasses the challenges associated with handling data that are from different sources, in various formats, and with varying levels of structure. By addressing the variability of Big Data, organizations can unlock their full potential and gain a deeper understanding of complex patterns and trends with data.

Meanwhile, Abiodun Oguntimilehin has brought the fifth V, value, into people’s view. Value states the importance of data, which are the significance and potential insights that can be extracted from the data. It represents the ability to derive meaningful information, knowledge, and actionable insights to results such as drive innovation, decisions, and business outcomes.

Then, there comes the sixth V of Big Data, veracity. Together with the first five V’s, they are the six V’s of Big Data. Veracity is like the quality of data which includes the reliability, accuracy, and trustworthiness of the data. It is particularly important in critical domains such as healthcare, finance, and cybersecurity, where accurate and reliable data is paramount for making informed decisions and maintaining trust.

For the seven V’s, there’s still one more V added— the visualization of data. Visualization is a way of presenting data in, for example, charts and graphs. This is critical as using these can convey information and data more effectively to people than just formulas and papers full of numbers.

Transitioning to the TEN and FOURTEEN V’s

When it comes to the ten V’s, as shown in Figure 2, there are suddenly four more V’s compared to the seven V’s and deleted “visualization”, which was just added as the seventh V. The four new V’s include validity, venue, vocabulary, and vagueness. Validity is to the extent to which data accurately represents the concepts or phenomena it is meant to describe or measure. Three main types of validity are relevant in Big Data analysis, measurement validity, construct validity, and external validity. Venue generally refers to the different platforms where data is stored, processed, or accessed. Vocabulary is considered an aspect that relates to the specific language, terminology, or domain-specific terms used within a particular dataset or data analysis context. It plays a crucial role in understanding and interpreting data accurately. At last, vagueness indicates the lack of precision or clarity in the data being analyzed.

Figure 2: The ten V’s of Big Data (Panimalar et al., 2017).

Later on, four more V’s, including visualization from previous sources, were defined to effectively manage Big Data. The other three V’s apart from visualization are volatility, virality, and viscosity. Volatility stands for the rate and extent of change or fluctuation in data. It signifies the dynamic nature of data, where values, attributes, or characteristics can rapidly and unpredictably change over time. Virality illustrates the speed of data spread. It can be affected by many factors and is an important thing for people who are analyzing Big Data to consider. Viscosity is used to describe the lag of events in Big Data, describing the resistance or difficulty in the flow of data.

Latest version of SEVENTEEN V’s

Finally, there are three more V’s that were introduced to the public and formed the seventeen V’s, including verbosity, voluntariness, and versatility. Verbosity indicates the excessive or unnecessary amount of data or information present in a dataset or system. Voluntariness is the degree to which individuals willingly and knowingly provide their data for collection, analysis, or usage in Big Data initiatives. Versatility represents the ability of a system or technology to handle diverse types of data, accommodate various data formats, and support multiple data processing and analysis techniques. These three new V’s helped people to better dig into Big Data.

Conclusion

As Big Data keeps evolving, there might be more V’s coming to people in the future representing new significant factors of Big Data. The existing V’s, the upcoming V’s, and Big Data itself have revolutionized the ways to capture, analyze, and leverage data. All the 17 V’s are an indivisible part of Big Data and they together build up Big Data while helping people to analyze and understand Big Data.

Reference:

Panimalar, A., Shree, V., and Kathrine, V. (2017). The 17 V’s of Big Data. International Research Journal of Engineering and Technology. https://www.irjet.net/archives/V4/i9/IRJET-V4I957.pdf

--

--