Data Science: The 5 V’s of Big Data

Surya Gutta
Analytics Vidhya
Published in
3 min readMay 4, 2020

--

Volume, Velocity, Variety, Veracity, Value

5V’s of Big Data

History

It started in the year 2001 with 3 V’s, namely Volume, Velocity and Variety. Then Veracity got added, making it 4 V’s. Then Value got added, making it 5V’s. Later came 8Vs, 10Vs etc.
We will discuss on the important ones (5V’s) Volume, Velocity, Variety, Veracity, and Value.

1) Volume

It refers to the size of Big Data. Data can be considered Big Data or not is based on the volume. The rapidly increasing volume data is due to cloud-computing traffic, IoT, mobile traffic etc.

Data growth prediction

2) Velocity

It refers to the speed at which the data is getting accumulated. This is mainly due to IoTs, mobile data, social media etc.

In the year 2000, Google was receiving 32.8 million searches per day. As for 2018, Google was receiving 5.6 billion searches per day!

Approximate monthly active users as of 2018:
Facebook: 2.41 billion
Instagram: 1 billion
Twitter: 320 million
LinkedIn: 575 million

Facebook monthly active users growth since 2008

3) Variety

It refers to Structured, Semi-structured and Unstructured data due to different sources of data generated either by humans or by machines.

Structured data: It’s the traditional data which is organized and conforms to the formal structure of data. This data can be stored in a relational database. Example: Bank statement containing date, time, amount etc.

Semi-structured data: It’s semi-organized data. It doesn’t conform to the formal structure of data. Example: Log files, JSON files, Sensor data, csv files etc.

Unstructured data: It’s not an organized data and doesn’t fit into rows and columns structure of a relational database. Example: Text files, Emails, images, videos, voicemails, audio files etc.

4) Veracity

It refers to the assurance of quality/integrity/credibility/accuracy of the data. Since the data is collected from multiple sources, we need to check the data for accuracy before using it for business insights.

5) Value

Just because we collected lots of Data, it’s of no value unless we garner some insights out of it. Value refers to how useful the data is in decision making. We need to extract the value of the Big Data using proper analytics.

What are the other V’s?

Viscosity (complexity or degree of correlation), Variability (inconsistency in data flow), Volatility (durability or how long time data is valid and how long it should be stored), Viability (capability to be live and active), Validity (understandable to find the hidden relationships).

Where is the Big Data stored?

Thank you for reading! Please 👏and follow me if you liked this post, as it encourages me to write more!

--

--

Surya Gutta
Analytics Vidhya

Software Architect | Machine Learning | Statistics | AWS | GCP