Seeking Map-Side Join
At LiveRamp, many of our hadoop workflows join two datasets together (more datasets are supported but for the sake of simplicity the blog will cover the case of two datasets). In order to join two datasets efficiently, both have to be sorted, which happens in the Shuffle phase of MapReduce…