Introduction to Big Data Problems and their Solutions

Kritikagarg
5 min readSep 17, 2020

--

As now a days everything got connected to Internet as we everyday keep our things on Internet social media platforms like facebook , Instagram , Twitter and many more .

But have you ever think where these huge amount of data that daily generated got stored ? What problems faced for storage of huge amount of Data ? What is the solution for these ?

Answer to all these questions leads us to the concept of Big Data . So let us see first What is Big Data ?

🤔 What is Big Data ?

Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently.

Big Data Examples →

Big data is getting bigger every minute in almost every sector, be it tech, media, retail, financial service, travel, and social media, to name just a few. The volume of data processing we are talking about is mind boggling. Here is some statistical information to give you an idea:

  • The weather channels receive 18,055,555 forecast requests every minute.
  • Netflix users stream 97,222 hours of video every minute.
  • Skype users make 176,220 calls every minute.
  • Instagram users post 49,380 photos every minute.

The statistic shows that 500+terabytes of new data get ingested into the databases of social media site Facebook, every day. This data is mainly generated in terms of photo and video uploads, message exchanges, putting comments etc.

The Google Search statistics show 3.5 billions of searches per day, which is over 40,000 searches every second on an average.

🤔 What are the types of Big Data ?

There are typically three types of Big Data are ,

  1. Structured
  2. Unstructured
  3. Semi-structured

1 . Structured Big Data

Any data that can be stored, accessed and processed in the form of fixed format is termed as a ‘structured’ data.

2 . Unstructured Big Data

Any data with unknown form or the structure is classified as unstructured data. In addition to the size being huge, Unstructured data poses multiple challenges in terms of its processing for deriving value out of it. A typical example of unstructured data is a heterogeneous data source containing a combination of simple text files, images, videos etc

3 . Semi-structured Big Data

Semi-structured data can contain both the forms of data including both structured and unstructured .

🤔 What are the Characteristics of Big Data ?

1 . Volume

The name Big Data itself is related to a size which is enormous. Size of data plays a very crucial role in determining value out of data. Also, whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of data. Hence, ‘Volume’ is one characteristic which needs to be considered while dealing with Big Data.

2 . Variety

Variety refers to heterogeneous sources and the nature of data, both structured and unstructured. During earlier days, spreadsheets and databases were the only sources of data considered by most of the applications. Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being considered in the analysis applications.

3 . Velocity

The term ‘velocity’ refers to the speed of generation of data. How fast the data is generated and processed to meet the demands, determines real potential in the data.

4 . Variability

This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively.

As we see in above characteristics of Big Data each characteristics has their own problems regarding speed , storage etc . Then now a days most of the companies use one of the solution which solve the problem of Big Data which is known as Distributed Storage .

🤔 What are the solutions for solving the problem of Big Data ?

The most optimal solution now a days that almost all companies uses is Distributed Storage .

A Distributed Storage is an infrastructure that can split data across multiple physical servers, and often across more than one data center. It typically takes the form of a cluster of storage units, with a mechanism for data synchronization and coordination between cluster nodes.

For Example , lets consider we have 500 GB of data but we have less resources to store it .One can think lets buy 500 GB storage and store our data into it . By doing these we can store our data but requires more time to process the data that arises the problem of I/O handling . Now the best solution for these is to divide the storage like 500 GB in 5 parts of 100 GB and lets store it in different 5 storage centers . Due to these our data can stored very efficiently which removes the volume problem and also it got stored in less time which removes the velocity problem .

In these Big Data world above 5 storage centers where we distribute our storage are known as Slave Nodes and from where we distribute our storage to slave nodes is known as Master Node . Now all these nodes combine to form a Infrastructure called as Cluster .In Big Data world it is known as Distributed Storage Cluster .

There are different tools present in the market to create and manage these kind of Distributed Storage Clusters . But the most trending tool used by almost many companies is Hadoop .

Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming mode .

🔰 So , now we got an idea about Big Data and how the MNC’S like Google , Facebook etc solve the challenges of Big Data .

💫I would like to thanks Mr .Vimal Daga for giving such small task which helps me to explore myself in Big Data World .💫

!! Thanking you all for visiting my article !!

--

--