10 Simple Questions to solve your doubts on Big Data & Big Data Analytics
Hello Folks! Everybody’s yapping about these hot trending terms and it’s the necessity that you too should be aware of recent technologies. The reason of you landing over this page has two possibilities:- either you are new to this concept or you are slightly confused about the same.
Well, worry not pals I’ve got you covered and with the best technique of short questions and answers, you’ll have a crystal clear idea of Big data and Big data Analytics. Let’s look at these 10 simple questions quickly…
Q1) What is Big Data?
Ans) Data is now everywhere because it is generated in huge amount in a short period of time. So this huge data is nothing but your Big Data. To make it simpler just break the term into two ‘Big’ — ‘Data’. The data that is big/huge enough is called the ‘Big Data’. It doesn’t end here.
Q2) Does this data has any attributes of it?
Ans) Yes indeed. Any kind of huge data can’t be a Big Data (like if you have 500 GB or 1 TB of data, is it a Big Data? No.), it has to have some attributes of itself. These crucial attributes are the 3 V’s of Big Data. These 3 V’s stand for:-
1.] Volume:- as mentioned already the volume of data generated has to be huge. For ex:- Facebook generates 500 TB of data every day, 2 million
2.] Velocity:- Stagnant data can’t be categorized into this. The data must be huge & generated at speed. For ex:- your stock data, imagine everybody talking about G.S.T. (Goods Service Tax) on social media platforms all at the same time, etc.
3.] Variety:- as simple as the word itself, variety means diversity in the type of data. Big Data can be categorized into two varieties:-
- Unstructured Data (constitutes 80%):- images, texts, audios, videos fall into this category.
- Structured Data (constitutes 20%):- relational databases, excel sheet, ms access fall into this category.
Q3) Who’s generating this Big Data?
Ans) The answer just depends on two obvious entities, we humans and our beloved machines, known as ‘Human-generated data’ and ‘I.O.T. (Internet of Things) data’ respectively.
- Human-generated data:- examples YouTube, Facebook, Twitter, Amazon, etc.
- I.O.T. data:- data generated by sensors (RFID, CCTV, agriculture sensors).
Q4) At what point should you say this is Big Data?
Ans) Well any kind of data that cannot be handled by the traditional database systems (like a relational database, data warehouse) is the time when you should say this is Big Data and now we need some expert systems to compute on it.
Simple isn’t it..?
Q5) Why Big Data now?
Ans) Good question but the answer is simple if you give it a thought. Every big thing was small initially and so is the case with data.
Data has grown over time and to accommodate that data we humans have modified storage devices. So now we have cheaper storage.
Another reason is that now we have strong computing capabilities than a few years back. For instance, compare your mobile with supercomputer of two decades back. Results will make you proud. LOL.
The future of Big data persists because data is gonna grow with every passing second.
Above are the reasons why Big Data exists now and it can be handled easily with a little bit of expertise.
Q6) When does Big Data Analytics come into picture then?
Ans) Having a mountain of data is a trash, unless you carve value out of it. That’s one of the V’s that defines the Analytics on Big Data.
Let’s talk about it with an example.
We all are aware that Facebook & Google know almost everything about us. Those aren’t software products anymore — those are our lifestyle.
Hence to make the user experience much better they don’t just collect & store our generated data, they apply some intelligent algorithms to get insight into it so that you(user) can get benefit out of it.
Some trivial examples of Big Data Analytics are:-
- Facebook getting smart enough to auto-tag you in the pictures the next time you upload them.
- Twitter doing sentiment analysis to create a wall between a positive & negative flow of thoughts.
- Other examples can be fraud detection, target marketing, financial advice, stock predictions, predictive maintenance (taking measures prior, to avoid failure of something like a car engine or so even before it actually fails), sugar level detection, chances of heart attack, etc.
Q7) Is there any tool to perform Big Data Analytics?
Ans) Yes, and the most primary and famous tool goes by the name ‘Apache Hadoop’. Earlier there were categorized data storage and respective analysis tools. For example:-
- for Streaming Data — Storm software
- for Machine Learning — Mahout software
- for Graph Data — GraphLab software
So it was getting a cluttered job to perform operations on such segmented data groups and thus our savior came into existence as another product ‘Apache Spark’ to consolidate all islands of data into one bigger continent and perform operations irrespective of what kind of data it is.
Though Apache Spark is built upon Hadoop as a framework itself. Apache Spark is the best till now.
Q8) What is the learning path for Big Data?
Ans) 1. First, you have to learn Apache Hadoop
- Big Data & Hadoop basics, Python/Java basics for map-reduce, Pig, Hive(developed by Facebook ).
2. Next, learn Apache Spark
- Scala programming, Spark streaming data, Machine learning with spark.
3. Learn NoSQL
- MongoDB, Cassandra, Hbase, Redis, DynamoDB.
4. Learn Big Data integration
- Making various Big Data technologies to work together. Example:- Spark+MongoDB, Kafka+Hadoop, Splunk+Spark.
5. Learn Big Data on cloud
- Big Data on AWS, Azure or Google cloud.
Q9) What is the learning path for Big Data Analytics?
Ans) 1. First, you have to learn Data Analytics which involves
- Python/R basics, statistics basics, data manipulation, exploratory data analysis.
2. Next, you’ll have to learn core Machine Learning
- Machine Learning and Statistics basics, supervised algorithms, unsupervised algorithms
3. Learn Data Visualization
- Tableau, QlikView, D3.js
4. Learn Big Data Hadoop
- Big Data & Hadoop basics, HDFS/MapReduce, Pig, Hive
5. Learn Spark & NoSQL databases
- Apache Spark, Data integration, NoSQL databases, Scala
Don’t panic, these technologies won’t give you a hard time if you are a programmer but if you are a toddler or unaware of that field (programming) it’s gonna be a hell of a journey but the outcome is something worth spending your time on.
You don’t have to learn all of them, you can pick one or two in every section depending on your interest and which academy you join to learn.
As per the adage:- “Where there is a will, there is a way.”
Simple and powerful at the same time.
Q10) What do you become after learning these technologies?
Ans) You, as a profile, can become a Big Data consultant whose job is to reach out to the clients, understand their business challenges and guide them why and how they need to implement Big Data solutions.
As a Big Data Consultant it’s not your job to sell the Big Data Solution but to consult or suggest the right solution to the respective enterprise. Selling is the concern of sales department.
Just for your awareness:- Ola & Uber, both use ‘Apache Spark’ for all kinds of functions you see like the nearest driver to you, nearest passenger to you.
Interesting, isn’t it? How a technology is empowering such a giant and smart business. Don’t know about you but I am exceptionally fascinated about what technologies can do.
Big Data is a problem that needs to be solved and industries are looking for such experts who can give their business a new curve. Specifically the ‘J’ curve.
Some of these industries are IBM, JPMorgan, Wallmart, and Amazon, Flipkart you know very well. They expect you to implement these technologies for their exponential ROI. Just talking about the trends isn’t gonna help anyone.
So buckle up guys, there lies a beautiful opportunity for you to grab and get started with it. The sooner the better. ‘Early bird gets the cake’.
I hope this gives you a clear picture of Big Data & Big Data Analytics. Reach out to others and share your knowledge with them. Who knows, someone might make a career out of it as a Big Data Consultant and fall two hands short to count the cash flow.
If there’s anything I can help you with to understand or inquire to save your valuable time from splashing garbage heap over the internet, do let me know and I’ll help in you in the best ways possible.
Thank you, your valuable visit is appreciated.