When did you last opened your social media account? If I am not wrong ? Maybe a minute ago?
In one minute
- 700,000 logins on Facebook
- Around 400000 tweets are tweeted on twitter
- 30,000 photos are shared on Instagram
- 21 million messages on WhatsApp
Today using Social Media account is common for all of us ! You must be thinking what’s the need of talking all these things if its common for all of us!
Yes! You are right ! Its really common! And we users made it so common that these media exceeded every limit of storage capacity available in this world. In simple word , there are no such storage device available in this world to store these data that we publish and post on media daily . You must be surprised for what i said?? And your mind must be flooded with many questions??
So let’s start with some relevant information…
In 2012, Facebook revealed that it is dealing with around 500+ terabytes of data every day. In which 2.7 billion likes and around 300 million photos per day. Another exciting thing is Facebook is scanning around 105 terabytes of data per each half hour!
As of January 2020, there are nearly 1 billion monthly active users on Instagram.
1 Billion is a big number, and it positions Instagram behind Facebook (2.6 billion) but ahead of most other social network sites, including Twitter (who has 68 million) and Pinterest (332 million).
- There are nearly 500 million daily active users.
- The Like button is hit an average of 4.2 billion times per day.
So here my question is how and where instagram manages to store this much of data!!
YouTube has confirmed that 300 hours worth of video content is being uploaded to the site every minute. That’s 12.5 days worth of uploads every 60 seconds!
- 300 hours of video are uploaded to YouTube every minute!
- Almost 5 billion videos are watched on YouTube every single day.
- YouTube gets over 30 million visitors per day.
With the rise of Netflix, and other streaming services, has come the rise of binge-watching, watching multiple episodes of a show in rapid succession. This made me wonder, how long would it take to watch everything on Netflix?
Let’s have a look..
So here the most important question arise in everyone’s mind is how Facebook, Netflix or any other media deal with such huge amount of data !! And here name of a problem comes up and that’s none other than a BigData problem!! So before answering I would like to explain
What actually Bigdata is??
So Bigdata is basically a problem of Volume and Velocity in case of huge amount of data that couldn’t be solved by any traditional method of storing data, using your harddisk and any other storage device !
Now another question comes up in the mind!
Why big companies do not try make any storage device with such a big capacity to store data in Petabyte and Exabyte??
Every one must have heard about the word Harddisk and RAM, one is storage device and other is memory! So anything we do in our Operating System is basically a process running inside our RAM and called program when we store it in our Harddisk or Pendrive! So we all know that RAM is a volatile memory not a storage device and speed of our system depends on our RAM and speed of RAM is very much faster than that of Harddisk or SSD ! If we try to retrieve any data from our storage device we can do it easily ! But only when this data is in small quantity. But what if billions of people from every corner of the world try to retrieve data stored on Facebook in Petabyte and Exabyte at the same time ! It will take a day or more than a day to retrieve those information if Facebook store their data on these Harddisk. So here another big problem of velocity or speed comes up under the Bigdata!
Till now we have counted on problems we face due to Bigdata, and that is problem of Volume and Velocity.
Now let’s have a look on solution
Master Slave Topology
Let’s move to period of kings! Monarchy system of government! When one king rules over the huge provinces. How?? How they look and deal over all parts of their country?? That one King rule these big provinces by hiring many kings under him for small part of country and rule by this policy. And these small kings works under a master king by doing equal amount of work assigned to them.
Now let’s come to technical term!
Yes, Here also! Master Slave topology is used to deal with such huge amount of data! Sometime I think that ancient period algorithm is helping us to develop in modern way today! By the way ! Let’s move to our problems and its solutions.
So! To deal with such huge amount of data, Master Slave Topology is used and what’s that?
Let’s understand in our traditional way! Suppose we are getting maximum data of 20 gb, and we decided to divide this data equally among five harddisk of 4gb (If maximum available storage device in market is 4gb only) ! So in this way we can store data easily of 20gb! So problem of storage is solved now! Lets look on velocity! Now let’s again suppose that if it takes 5 min to retrieve data of 20 gb! So how much time will be taken by 4gb data? Answer is 1 min! And we have divided our storage in between five harddisk of 4gb, So! If we wish to retrieve any data of 20gb it will only take 1 min because all harddisk is of 4gb only ! So here problem of our velocity is also solved now!
This type of topology is also called as Hadoop Cluster
Hadoop is an open source, Java based framework used for storing and processing big data. The data is stored on inexpensive commodity servers that run as clusters. Its distributed file system enables concurrent processing and fault tolerance. Developed by Doug Cutting and Michael J. Cafarella, Hadoop uses the MapReduce programming model for faster storage and retrieval of data from its nodes. The framework is managed by Apache Software Foundation and is licensed under the Apache License 2.0.
For years, while the processing power of application servers has been increasing manifold, databases have lagged behind due to their limited capacity and speed. However, today, as many applications are generating big data to be processed, Hadoop plays a significant role in providing a much-needed makeover to the database world.
1. Capacity: Hadoop stores large volumes of data.
By using a distributed file system called an HDFS (Hadoop Distributed File System), the data is split into chunks and saved across clusters of commodity servers. As these commodity servers are built with simple hardware configurations, these are economical and easily scalable as the data grows.
2.Speed: Hadoop stores and retrieves data faster.
Hadoop uses the MapReduce functional programming model to perform parallel processing across data sets. So, when a query is sent to the database, instead of handling data sequentially, tasks are split and concurrently run across distributed servers. Finally, the output of all tasks is collated and sent back to the application, drastically improving the processing speed.
So! This is how today media industry is growing and managing its Bigdata problem