WHAT IS HADOOP?

Aditya Anand
2 min readSep 25, 2022

--

https://in.pinterest.com/pin/516436282276671805/

By dividing the data into smaller buckets, several computational issues can be resolved very quickly.

Consider the scenario where you are looking for the highest number among 100 million different numbers.

You can examine each one individually.

If your computer is capable of processing a million numbers per hour, it will take 100 hours, or just over 4 days, to complete this task.

Now, if you divided your data into 100 pieces and distributed them across 100 computers, each machine would discover its greatest number in an hour. If you then spent a few extra seconds identifying the largest among those 100, you would be essentially finished in an hour.

These kinds of problems are known as embarrassingly parallel problems, and the process of partitioning the problem (Mapping) into smaller components and then connecting the individual results to build a larger result (Reducing) was detailed in a Google paper. This process is known as MapReduce.

Recent articles: I NEVER UNDERSTOOD RECURSION

Hadoop is an open source software that makes doing mapreduce type programming easier.

You don’t have to be concerned with setting up the application on your 100 machines, dividing up your beginning data, transferring it to all 100 workstations, copying findings from 100 machines over, etc.

Hadoop is responsible for managing all housekeeping.

Once you’ve set up a Hadoop cluster with over 100 servers, you can feed it any program and data, and it will handle all the background processing and deliver the results to you.

--

--