MongoDB and MapReduce

Hello everyone. Today I’m gonna write an article about MongoDB and MapReduce operation. First of all I want to give a brief instruction about MapReduce programming paradigm

MapReduce is a programming paradigm that works on a big data over distributed system. It analysis data and produce aggregated results. Key / values pairs have declared in the map function which we use this values to accumulate data. Later in reduce function we use this accumulated datas, accumulated in the map function, to convert them into the aggregated results.

Map function looks like a SQL select query
Reduce function looks a SQL; count, having or avg query

Calculated map function creates an output of results before the reduce function which it takes key / value pairs for parameter. Both of this function must be written by programmer.


After this instruction lets describe an MapReduce practice before our MongoDb example. That practice is about word count in text. First it splits our input into the 3 pieces. Inside the mapping function it counts each word in the splitted groups. After mapping section, it groups the same words in the same chunk. Lets look at the image. In mapping part it counts and we find there are 2 dogs,3 cars, 2 cats and 2 rats. Next phase it groups the same word. After shuffling, reduce function makes another count operation inside these chunks. It accumulates the values by each word. Finally it combines final results and outputs the screen. You may think that it’s just a basic operation why do we need MapReduce or MongoDB. Just think, what can we do if we have got thousands or millions of data. MapReduce paradigm is the best option to deal with higher values of data.

There are lots of docs about MapReduce paradigm on the internet. Further information: research.google.com, wikipedia.com/MapReduce


Example : My example based on C# console application. I use restaurants collection dataset in this example which published by mongodb in their website for learners. Every document,each restaurant, have borough value. With map function we group each restaurant by their borough value. Then we accumulate restaurant values in the reduce function. In addition we make filter operation for aggregate results by each restaurants cuisine type.

  • MongoDB database connection settings
we can make a console application for our example
  • MapReduce operations are function that they are written by javascript. Map function defines “this.borough” as a key and value is 1. This means, count each resturant in borough by its value. Value is 1. Because each restaurant in borough will raise this value. For example if we got 10 restaurant in borough, value will be 10. Reduce function takes 2 parameters; key and value. “values” is parameter and array which contains the resturant number by each borough. Then function uses Array.sum operation for accumulate resturant values for each borough. Finalize function is optional. But it’s very useful for designing output value.
map, reduce and finalize functions
  • Filter by restaurant cuisine type
restaurant cuisine = ırish
  • We create a MapReduceOptions object to define our options in it. Restaurant class is our input value type. Our output value type is a BsonDocument. We have got two functions; finalize and filter. Both of them are define inside options object variable.
  • MapReduce method takes 3 parameters. These are map function, reduce function and options object. Then we make an await operation to get all aggregate results.

  • Image below this paragraph shows our output with no filter. In total we have got 25.360 restaurant value which they are groupped and counted by their borough value.
no restaurant type filter
  • When we use restaurant type filter, output will be different.
with restaurant type filter

We use MapReduce operation for our example. It will be a better way to deal with high volume data sets. If we have a new values in our dataset, we won’t need to apply MapReduce operation to our whole dataset again. When we use Incremental MapReduce operation, this operation will only effect to our new values. Then the new aggregate result is merged with the old result.

MongoDB MapReduce operation doc

Turkish version of this article