The V’s of Big Data

Kevin Hopkins
CISS AL Big Data
Published in
4 min readNov 2, 2022

By Kevin Hopkins

As the field of big data grows, so does the interest in the field. To help newcomers understand the basics of big data, the experts have condensed the core of big data into 5 main Vs and many other less important ones. In this article, I’ll be explaining what the 5 main Vs are and the usages of big data in everyday life.

The first V is volume. Volume refers to the amount of data that you are making a prediction based upon. As data is getting cheaper and cheaper to store, the bigger volume of data we are able to analyze. Volume is one of the big things that separates big data from conventional statistics. While conventional statistics uses a small but carefully curated dataset, removing outliers and removing inaccurate data, big data uses everything, including corrupted data and outliers.

The second V is velocity. Velocity refers to the speed at which data is being generated. As the daily users of the internet has grown alongside with the cheaper storage of data, data is being generated and stored at an astounding rate. 70% of the world’s population use the internet regularly, and 5 billion internet users obviously means incredible amounts of data generated every second of the day.

https://www.webafrica.co.za

The third V is variety, and it refers to the variety of data types that big data utilizes, which are structured, semi-structured, and unstructured data. Structure data is data in the most traditional sense: numbers and figure that are able to be put on spreadsheets and traditional databases. Semi-structured data is data that doesn’t conform to traditional databases but nevertheless still has some semblance of consistent markers that allows you to create an organizational structure. Unstructured data is raw data without any markers, they would be things like text messages, emails, phone calls, etc.

The fourth V is veracity, and it refers to the integrity of the dataset. Even though big data is much less stringent about the data’s accuracy, that doesn’t mean you can cut corners when it comes to data cleaning. You need to know where the data is coming from and whether the source is trustworthy, because big data utilizes resources from many different sources, you need to verify their integrity.

The fifth and last 5 is value. It doesn’t matter what kind of analysis you’ve done, if there is no value to be extracted from it.

The V I find to be most important out of them all is Velocity. I feel like velocity is what makes big data what it is. With a high velocity, it feels like many other facets of big data could be accomplished with traditional statistics. Velocity of data also allows you to have constant updates on your algorithms to have more accurate readings. A high velocity of data also allows you to cross-check your predictions with the actual, real-world amount.

Traditional statistics is very rigid. You have a fixed data set, and you conduct cleaning and tweaking to get the most accurate data set possible, and then take random samples to get a reading on the data set, to then take a reading on the thing you want to analyze. However, in today’s every-changing world, when you finish doing all these calculations and give you projection, the data you have painstakingly combed over, double-checked, and randomly sampled has a high chance of being outdated already. The velocity of information is too great in today’s world. Google users generate over 5.6 billion searches a day, and people’s interests change day by day. Big data’s importance can be seen most clearly in social media services such as TikTok, YouTube or Facebook.

https://pngpedia.blogspot.com

These companies have invested a lot of money and time into their content recommendations to curate the billions of videos and posts on their platforms to perfectly match the tastes of the consumer. I bet you take their recommendations being so spot-on as granted so it may not seem like much but imagine if it took weeks for YouTube to take into account a new area of interest you’ve discovered through a video. It would be beyond frustrating and would be the reality we live in if YouTube didn’t use big data. But because big data is less concerned about the accuracy of information and more about the volume, velocity, variety, veracity and value of the data, it’s able to bypass the scrutinous cleaning period that conventional statistics requires, and because of the velocity of data, is able to change its recommendations in real-time. And the recommendations could sometime be even better than the ones given by conventional statistics, because big data utilizes the outlier and unclean data that could provide vital insight.

References:

--

--