Goals, Not Definitions: The Growing V’s of Big Data

Kate Anderson
CISS AL Big Data
Published in
7 min readSep 9, 2023
Figure 1: 2Addicting. Flickr. https://www.flickr.com/photos/139666837@N07/26551612897/

Big Data is a rapidly expanding field that involves collecting, storing, and processing unstructured data on a massive scale using computers — from a few terabytes to hundreds of petabytes and more. As our technology grows more powerful, Big Data reaches so many more parts of our lives and now connects the world, as shown in Figure 1. It decides what we see on the internet, makes important business decisions, and calculates our routes when driving, among many other applications. Big data can equip an advantage for businesses that implement it as it provides helpful and timely results.

In 2001, as Big Data was stepping into the spotlight, Doug Laney famously came up with the three V’s of Big Data: volume, velocity, and variety. These three words were one man’s definition of Big Data, and soon the definition was used by many. But two extensions popped up, veracity and value, and then many more. But what do these words even mean?

The five V’s

Volume: This word refers to the size of the data. In this modern world, the volume of data available to Big Data analysts grows faster and faster. However, not all applications need petabytes of data. If you gather all the data that’s available around your topic, even if just a few gigabytes, it is still Big Data — as long as n = all. This is a crucial part of Big Data; using as much as you can gives understanding and insights that sampling just can’t reach.

Velocity: The speed of data collection is called velocity. Often, Big Data is used to make fast decisions. In these scenarios, processing new data as fast as possible is imperative. For many large internet companies, velocity might look like the amount of data users post every day — making sure that they have high velocity will allow them to get new and relevant posts of important events to surface in their algorithms.

Variety: While striving to get all the data possible, analysts can’t leave out vast amounts of relevant data because they just don’t fit. In Big Data, the data can be unstructured and messy — a good thing. This means all sorts of digital media are included in the data, like documents, texts, videos, images, blogs, and others.

Veracity: This term, which gained traction and expanded the three V’s. to four in 2013, represents the credibility of data. Data is collected from all over, even from low-accuracy sources. Such poor-quality data could result in terribly inaccurate results that could prove harmful to the highlights the data produces.

Value: The idea of value — the benefit of a business even bothering with Big Data — was put under the spotlight sometime around 2013 when it, too, was turned into one of the V’s. Why should a business even bother spending time and money on Big Data if they don’t even know whether it will help them at all? Finding the value of Big Data in different applications could give an answer.

Figure 2: Tsevis, C. Flickr. https://www.flickr.com/photos/tsevis/5764386235

V’s as Goals

But how did these last two hop onto the list? All five of these words are commonly found now — along with many others. Are these recent extensions truly just as valid as the first three? As it turns out, they are — they might just represent a different era of Big Data. Let’s compare the challenges faced by the field of Big Data in 2001 with challenges in 2013 when the iconic V’s started getting extensions.

Back in 2001 when Laney coined the three V’s, Big Data felt different than when the fourth and fifth V’s were added on. It was in more of an infancy stage, and the challenges at the time revolved around lack of awareness, limited hardware, slow data collection, and lack of unstructured data. Lack of awareness slowed down progress toward a better world of Big Data — not many knew what it was, and so the pioneers of the field struggled to check off the first (and only) V’s without much support.

Because of the limitations in hardware, the volume of data was much harder to grow without quickly exhausting budgets. When 2013 came along, storage was much cheaper and much more compact, and processors were able to handle much higher volumes. As shown in Figure 2, much more hardware was available. The internet’s capabilities have expanded and sped up to let people from all over the world access vast amounts of data. Nowadays, the hardware is even more capable and still getting faster. Check.

Data collection was also quite primitive at the time, slowing velocity. The internet wasn’t as crucial to everyone’s lives as it is today, making it harder to gather information from users. Throughout the entire year of 2001, around 27 billion Google search queries were served. In 2013, Google could reach this number in about a week. This challenged Big Data scientists with velocity: such little internet activity incentivized few real-time internet data sources, so many applications at the time were not associated with the internet — rather, they used sources that collected data with very little velocity. Now, the internet serves quick data sources on a platter, from live sensors to social media feeds to online markets. Check.

The seemingly ancient stage of 2001 Big Data mostly used structured data, which diminished variability in 2001. The world had few unstructured data sources, unlike 2013’s world of active social media and the internet of different digital media types. In 2001, algorithms for Big Data just weren’t as able to handle all these different types of sources as they could more than a decade later because of this lack of unstructured data to sift through. 2013 provided many more unstructured data sources and held algorithms capable of using them to draw predictions — and in today’s age, there’s even more. Check.

So, these three V’s may have represented the large problems faced by Big Data back in its infancy. Does that mean the field faced no problems with veracity and value? Not necessarily — but because of the large challenges in volume, velocity, and variability, Big Data scientists had to keep their focus on improving through those three V’s first. So, Big Data solved the first three V’s by 2013? Still not quite — the field will never truly “solve” them. It can, however, grow more capable to the point where scientists can loosen their focus and instead look upon new challenges. Let’s continue and see what Big Data faced around the year 2013:

With the speedy growth of the internet and new internet data sources popping up left and right in 2013, the problem of veracity walked into the spotlight. It had always been a bit of a problem, but one that fell behind the prominent three V’s as the small amounts of structured data were often from trustworthy sources — the field of Big Data was too small to be riddled with poor-quality sources. In 2013 it became more noticeable with the advent of many more unstructured data sources and relatively high volumes that made it challenging to deduce the quality of sources. Big data scientists worked to patch this problem, coming up with quality standards, regulations, and metrics to keep sources accurate. However, this is still an issue today as the total volume and variability (and already topped-out velocity) of Big Data continue to grow with more and more data sources.

As “Big Data” became a term more widely known, the incentive to use it for business — value — also stepped into focus. As Big Data grew much more common in the business world, the benefits of using it had to be deduced in order to entice businesses to utilize it. Big Data, for many businesses, can help make important decisions and benefit customer experience, but it’s important to know whether a business would benefit at all from taking the time and money to invest in Big Data.

Conclusion

As Big Data discovers new challenges in the future, new V’s may be declared for the field of Big Data to aim for — and old ones may stand as standards, reminders of what Big Data has already become. Many more V’s already exist today, although only some pose real goals; the others have yet to step under the spotlight but will grow in popularity when the associated challenge becomes relevant. Issues Big Data might face in the future could be privacy and morality as the field continues to scrape data off everyone, every day. Whatever those future V’s might be, the field of big data will do its best to achieve them.

References

“Google Search Statistics.” Google Search Statistics — Internet Live Stats, www.internetlivestats.com/google-search-statistics/.

Cristobal, Samuel. “Two More V’s in Big Data: Veracity and Value.” Datascience.Aero, Datascience.aero, 7 Jan. 2021, datascience.aero/big-data-veracity-value/.

Inc., Gartner. “Gartner’s Big Data Definition Consists of Three Parts, Not to Be Confused with Three ‘v’s.” Forbes, Forbes Magazine, 22 Sept. 2014, www.forbes.com/sites/gartnergroup/2013/03/27/gartners-big-data-definition-consists-of-three-parts-not-to-be-confused-with-three-vs/?sh=12a6051142f6.

Shafer, Tom. “The 42 V’s of Big Data and Data Science.” Elder Research, Elder Research, 11 Nov. 2020, www.elderresearch.com/blog/the-42-vs-of-big-data-and-data-science/.

--

--