Merge sort is one of the most optimized ways of sorting with running time complexity O(n log n). The underlying principle is that it is cheaper to break the array into 2 parts and sort them individually and merge them at the end.
Let us code the most rudimentary form:
Wordcount problem basically demands the count of each word across multiple documents. This could be directly solved by keeping a counter for each word and incrementing the same on its occurrence. This solution is however not scalable over millions of files (sending the files’ content from each node to the master is not feasible).
Hence we apply the MapReduce paradigm to solve this.
Here the Map part would be to collect the Wordcount from each file and the Reduce part would be to digest this list across all the files to generate the final Wordcount (shuffle is implicitly handled).
So all this started during the winter of 2017 when I started preparing myself for the internship I secured for the summers of 2018. During a call with the host of the internship project, I was intimated that my work would demand good skills in Golang.
A quick introduction: Golang(Go) was developed by Google and is very powerful and the language allows better coding & design patterns for concurrency and parallelism.