Reducers in MapReduce
The critical part that everyone needs to understand about reducers is the different fluctuations in the number of reducers, that we use in different situations and case studies.
Now typically the number of Reducers in the MapReduce procedure is always 1.
But we can increase them more or even we can make zero reducers based on the result and optimization that we need to perform.
Now, Let’s understand when can we increase the number of reducers.
In some hypothetical situations where the mappers do less work than compared to the reducer, which makes the usage of parallelism and cluster less, In those situations we can make increase the number of reducers based on our requirement and optimization purpose.
There may be a raise of the question here, that which {key, value} pair of mapper result goes to which reducer?
This is done by using the concept called Partitions, which are always equal to a number of reducers.
Partition 1 data goes to reducer 1 and partition 2 data go to reducer 2, respectively.
The transfer {key, value} from mapper to partition, depends on a Hash function which is always consistent ( for accurate results).
In some cases, the reducers can also be 0
There are some jobs where there will be no need for reducers.
For example, the shuffling and sorting that was done at the reducer end can be done at the mapper end itself. In those cases, there will be no work for the reducer to perform.
Thank you!