The Truth Behind the « Big Data » Buzz-Term

Big Data…Really?

Few years ago, I had a discussion with a mentor of mine about the career path I wanted to pursue, and I said: “Look, Big Data is something really great, and I want to become a Big Data Engineer later on!”, and his answer was: “Okay, but be cautious, Big Data is not a revolution, and just like the “Cloud”, marketers have done their jobs”. I didn’t trust his words back then, and… You bet! He was right!

Even though I’m a “Big Data Consultant” (after all, we’re just software engineers with some kind of knowledge used in the “Big Data Industry”), I’m not okay with the use of the term “Big Data”… Why? Because data has been growing during the last decades, and the discussion of “how can we handle all this data?” has always been current. In that sense, I do prefer using the term “Data of Unusual Size”, because it has always been contextual. Every time we realized that the architectures we had became outdated and were not able to handle the data we had anymore, we innovated!

In 10 years, the data we gather will be even bigger, and how are we going to call it? Big-Big Data? You got it!

The concepts behind Big Data… Are they really a “revolution”?

If you ask 10 specialists on how do they perceive “Big Data”, you are more likely to have 10 different (and perhaps divergent) definitions. In a nutshell, it basically means: “I have a problem, which is that I have lots of amounts of data, and I have to make sense out of it. To do so, I’ve got to think about new technologies, new architectures and new paradigms in order to process and aggregate this data, so it can become “humanly understandable”, and usable”.

People might also refer to the “3Vs” originally introduced by Doug Laney in 2001: Volume, Variety and Velocity. More Vs are added today, be they veracity, value or variability, but they all are, IMHO, questionable.

One of the key technical components behind “Big Data” today is the distributed computing paradigm, and this term has been introduced in… 1960, or even before! And Hadoop, the most popular Big Data framwork so far (even though it is losing some of its popularity), is all about using commodity hardware to process computations accross multiple nodes (or low cost servers) to handle complex computations without paying too much… And guess what? “Big Data” is based upon many other concepts that existed many decades ago :).

In other words, today is introduced a “new concept” which is based upon old paradigms gathered together… So is “Big Data” a new concept that will save us from apocalypse? Not really.

… But the need is real

So, don’t we need Big Data today? Well, don’t get me wrong, I’m not saying that it is useless, at all! I’m just warning you that, the term “Big Data” is nothing but a buzz-term used by marketers and companies to sell their products and services to other companies, but the need is real.

All those “old” concepts gathered together along with (true) new ones such as Machine Learning for example created a new era of data understanding and data insight that hasn’t existed before. Don’t you ask yourself why the term Big Data is not mentionned in the Gartner’s yearly hype cycle of emerging technologies? You guessed :)

After we gather data and are ready to do some processing, we have to choose the right ML algorithm, the right aggregates and the right relationships to take the most effective solution in a given situation to make better sales, target the right audience and gain new customers, which is after all what it’s all about.

Today, the industry doesn’t pay us for what we know, it rather pays us for what we can do with that knowledge, and following the same logic, the value of “Big Data” is only real when it comes with real insights so companies can take precise decisions.

This post has been originally published in LinkedIn