What are side data distribution techniques in Hadoop?
The extra read only data required by a hadoop job to process the main dataset is referred to as side data. Hadoop has two side data distribution techniques -
i) Using the job configuration — This technique should not be used for transferring more than few kilobytes of data as it can pressurize the memory usage of hadoop daemons,particularly if your system is running several hadoop jobs.
ii) Distributed Cache — Rather than serializing side data using the job configuration, it is suggested to distribute data using hadoop’s distributed cache mechanism.
We have further categorized Hadoop MapReduce Interview Questions for Freshers and Experienced-
- Hadoop Interview Questions and Answers for Freshers — Q.Nos- 2
- Hadoop Interview Questions and Answers for Experienced — Q.Nos- 1,3,4,5
<a href=”https://www.besanttechnologies.com/training-courses/data-warehousing-training/hadoop-training-institute-in-chennai">Hadoop Training in Chennai</a>
<a href=”https://www.besanttechnologies.com/training-courses/data-warehousing-training/big-data-hadoop-training-institute-in-bangalore">Hadoop Training in Bangalore</a>