MongoDB and MapReduce
Hello everyone. Today I’m gonna write an article about MongoDB and MapReduce operation. First of all I want to give a brief instruction about MapReduce programming paradigm
MapReduce is a programming paradigm that works on a big data over distributed system. It analysis data and produce aggregated results. Key / values pairs have declared in the map function which we use this values to accumulate data. Later in reduce function we use this accumulated datas, accumulated in the map function, to convert them into the aggregated results.
Map function looks like a SQL select query
Reduce function looks a SQL; count, having or avg query
Calculated map function creates an output of results before the reduce function which it takes key / value pairs for parameter. Both of this function must be written by programmer.
After this instruction lets describe an MapReduce practice before our MongoDb example. That practice is about word count in text. First it splits our input into the 3 pieces. Inside the mapping function it counts each word in the splitted groups. After mapping section, it groups the same words in the same chunk. Lets look at the image. In mapping part it counts and we find there are 2 dogs,3 cars, 2 cats and 2 rats. Next phase it groups the same word. After shuffling, reduce function makes another count operation inside these chunks. It accumulates the values by each word. Finally it combines final results and outputs the screen. You may think that it’s just a basic operation why do we need MapReduce or MongoDB. Just think, what can we do if we have got thousands or millions of data. MapReduce paradigm is the best option to deal with higher values of data.
Example : My example based on C# console application. I use restaurants collection dataset in this example which published by mongodb in their website for learners. Every document,each restaurant, have borough value. With map function we group each restaurant by their borough value. Then we accumulate restaurant values in the reduce function. In addition we make filter operation for aggregate results by each restaurants cuisine type.
- MongoDB database connection settings
- Filter by restaurant cuisine type
- We create a MapReduceOptions object to define our options in it. Restaurant class is our input value type. Our output value type is a BsonDocument. We have got two functions; finalize and filter. Both of them are define inside options object variable.
- MapReduce method takes 3 parameters. These are map function, reduce function and options object. Then we make an await operation to get all aggregate results.
- Image below this paragraph shows our output with no filter. In total we have got 25.360 restaurant value which they are groupped and counted by their borough value.
- When we use restaurant type filter, output will be different.
We use MapReduce operation for our example. It will be a better way to deal with high volume data sets. If we have a new values in our dataset, we won’t need to apply MapReduce operation to our whole dataset again. When we use Incremental MapReduce operation, this operation will only effect to our new values. Then the new aggregate result is merged with the old result.