Aggregation in MongoDB

Ashiq KS
8 min readJan 9, 2019

--

An article on the aggregation pipeline and functions in the MongoDB database

MongoDB — A NoSQL database

Before getting to know about MongoDB, we have to know what a NoSQL database is and how it is different from the other popular database type SQL.

NoSQL databases are called ‘non-relational’ databases whereas SQL databases are called relational databases because a table in the SQL database can be related to another table but in the case of a NoSQL database it doesn’t need to be so because it has its own to achieve what SQL does.

A database contains multiple tables and a particular table contains columns or fields and every row is a single data. In a SQL table, every data has the same schema, means once a table has been initialized then the data being inserted has to have all the fields depicted in the table, even if values for some fields are not present then those fields are filled in with the ‘NULL’ data. That is the fields are static. But in the case of a NoSQL, a data being inserted doesn’t need to have all the fields, that is the fields are dynamic. This is one of the most important factors that differentiate a NoSQL from a SQL database.

MongoDB is a widely used popular NoSQL database all over the world. Some of the other NoSQL databases are Redis, Cassandra, Apache CouchDB, and the list go on.

What this article is about

This article is not an introduction to MongoDB. If you are new to MongoDB, I highly recommend you to go this article first to understand MongoDB and its basic functions.

This article is about the ‘aggregate’ functions in MongoDB which are highly used for querying and analysis of the data. In this article, we will be discussing the commonly used aggregate functions like the $match, $group, $count, $sum, etc.

MongoDB can be downloaded from here. After installing add the binary variables to the user’s environment. The working environment is Windows 64-bit and a GUI called Studio 3T for MongoDB operations. Make sure you run a mongo session in order to run Studio 3T. Without further due let’s get started off.

Aggregation Pipeline

Aggregation is the procedure of processing data in one or more stages where the result of each stage is inputted to the next stage in order to return one combined result.

Aggregation groups documents in a collection and is used to provide results like the total number of documents, sum, average, maximum and minimum values. To the people who are acquainted with SQL the aggregation stages of MongoDB are analogous to COUNT(), SUM(), GROUP BY, LIMIT etc of SQL’s.

The basic syntax for aggregation pipeline is as follows.

db.<collection_name>.aggregate([])

How aggregation pipeline works in MongoDB is all the documents in a collection are inputted to the first aggregation stage then result of this function is inputted to the next aggregation stage, from here its output is given as input to the next aggregation stage and so on like this.

So aggregation in MongoDB works by chaining one or more functions.

Matching the documents

We can filter documents out based on criteria we specify by using the $match stage in the aggregation pipeline so that only the documents we care about will get processed. It’s analogous to SQL’s WHERE clause. Let’s see it in action.

I already have a collection named restaurants in database1 with the fields like _id, name, address, rating, type_of_food etc.

Now if we want to process the documents that have a rating, say 5, we can filter it out as follows.

db.restaurants.aggregate([{$match: {"rating": 5}}])

Output image from Studio 3T

As you can see in the image we have successfully retrieved only documents those have a rating of 5.

Grouping the document

When we have to do analysis on a group of data we can use the $group aggregation stage. So by this, we can perform analysis on data that are grouped together by some features. This is similar to SQL’s GROUP BY clause.

Let’s we want to count the number of restaurants based on type_of_food they are providing.

db.restaurants.aggregate([{$group: {_id: "$type_of_food", count: {$sum: 1}}}])

What this does is it order the documents by type_of_food fields and counts the number of restaurants on each category. As you can see there are two restaurants that fall under the category ‘Pick n Mix’. Like that six restaurants fall under the category of ‘Azerbaijan’. The ‘_id’ and ‘count’ are the fields we specify where ‘count’ is not in the collection. The ‘count’ uses the $sum operator to sum ‘1’ to itself when it comes across each document under the type_of_food category.

Let’s see chaining of ‘$match’ and ‘$group’.

db.restaurants.aggregate([{$match: {rating: 5.5}}, {$group: {_id: "$type_of_food", count: {$sum: 1}}}]) this yields the output where the rating is 5.5 and then counts the number of restaurants in each ‘type_of_food’

Projecting the fields in a document

When we don’t want to display all the columns after processing we can omit the fields by using ‘$project’ stage. The syntax is as follows.

db.<collection_name>.aggregate([{$project: {<field_1>: <1>, <field_2>: <0>, ...<newField_1>: <expression>....}}])

If we want to display only a fewer number of fields as compared to the total number of fields then we write down only those fields with a value of 1. Otherwise, if we want to omit only a fewer number of columns then we will mention only those fields and give a value 0. Please note ‘_id’ field is default so if don’t want that field then we should give it a value of 0.

Let’s see an example using the ‘$match’ and ‘$project’ stages together.

db.<collection_name>.aggregate([{$match: {rating: 6}}, {$project: {_id: 0, name: 1, type_of_food: 1, rating: 1}}])

This projected only the ‘name’, ‘rating’ and ‘type_of_food’ fields where the rating is 6.

Limiting the number of documents

When we don’t want to print all the documents but a very few numbers then we can use the ‘$limit’ stage to control the number of documents being displayed. It is analogous to the LIMIT clause in SQL. The syntax is as follows.

db.<collecton_name>.aggregate([{$limit: <number>}])

Here let’s see an example.

db.restaurants.aggregate([{$match: {rating: 6}}, {$project: {_id: 0, name: 1, type_of_food: 1, rating: 1}}, {$limit: 3}])

Here we have limited the number of documents being displayed to just 3.

Sorting the documents

Let’s see how we can sort the documents using the ‘$sort’ stage in the aggregate function.

The syntax is as follows

db.<collection_name>.aggregate([{$sort: {<field_1>: <-1 | 1>, <field_1>: <-1 | 1>.....}}]) ‘1’ represents for ascending order and ‘-1’ represents for descending order sorting.

We will see an example of this.

db.restaurants.aggregate([{$project: {_id: 0, name: 1, type_of_food: 1, rating: 1}}, {$sort: {type_of_food: -1, rating: 1}}])

In this example, the type_of_food is sorted descending and rating is sorted ascending.

Counting the documents

‘$count’ stage help to count the number of documents. The syntax is as follows.

db.<collecton_name>.aggregate([{$count: "<title>"}])

Let’s see an example with the ‘$match’ stage.

db.restaurants.aggregate([{$match: {type_of_food: "Caribbean"}}, {$count: "no_of_restaurants_with_caribbean_food"}])

We can see the number of restaurants with food type ‘Caribbean’ is 46.

Outputting the result

We can save the final result of an aggregation pipeline to another collection using another aggregation stage called ‘$out’. This stage has to be the last stage out of all other stages in the aggregation pipeline. The syntax is as follows.

db.<collection_name>.aggregate([{$out: "<outPutCollectionName>"}])

We will see it in practice with an example.

db.restaurants.aggregate([{$project: {_id: 0, name: 1, type_of_food: 1, rating: 1}}, {$sort: {type_of_food: -1, rating: 1}}, {$out: "aggregateResultCollection}])

This will save our output documents to a new collection named ‘aggregateResultCollection’.

Aggregation Pipeline Operators

There are other operators in the aggregation other than the aggregation stages. Here we will discuss some of them such as ‘arithmetic’, ‘comparison’ and ‘boolean’ operators. Most times these operators come inside the ‘$group’ stage.

Arithmetic Operators

These perform the mathematic operation on the fields. Some of them are ‘$sum’, ‘$avg’, ‘$max’, ‘$min’ etc.

‘$sum performs the sum operation on the numbers of a field. It’s analogous to the SUM() operator in SQL.

‘$avg’ finds the average of all the numbers in a field. It’s as same as the AVG() operator of SQL.

‘$max’ finds the maximum value of the field. MAX() in SQL is similar to it.

‘$min’ finds the minimum value of the field. MIN() in SQL is similar to it.

The basic syntax for these operators is as follows.

db.<collection_name>.aggregate([{$<operator>: "<expression>"}])

Let’s combine all together and see.

db.restaurants.aggregate([{$match: {rating: {$ne: "Not yet rated"}}}, {$group: {_id: "$type_of_food", total_rating: {$sum: "$rating"}, avg_rating: {$avg: "$rating"}, max_rating: {$max: "$rating"}, min_rating: {$min: "$rating"}}}])

We have filtered out initially some not to have a rating of “Not yet rated” using the ‘$group’ stage and a comparison operator ‘$ne’.

We have grouped the documents by the field ‘type_of_field’. So we can see the sum of ratings is 12194, the average rating is 4.9, the maximum rating is 6 and the minimum rating is 1 for each type of food.

Boolean Operators

Like SQL has AND, OR, NOT operators for logical analysis, MongoDB has also boolean operators such as ‘$and’, ‘$or’, $not, ‘$nor’ etc. The name of each operator itself explains the operations done by them. The syntax is as follows.

{$boolean_operator: [<expression1>, <expression2>,….]}

Let’s see an example.

db.restaurants.aggregate([{$match: {$and: [{type_of_food: "Caribbean"}, {rating: 4}]}}])

In here the query is only for the ‘Caribbean’ type of food and having a rating of 4. It will return the results of all the documents which have a value of ‘Caribbean’ in the ‘type_of_food’ field and a rating of 4.

Comparison Operators

These are operators that help us compare the numeric values in the fields. Some of these are ‘$gt’ — greater than, ‘$lt’ — less than, ‘$nte’ — not equal to, ‘$eq’ — equal to, ‘$lte’ — less than or equal to and so.

The syntax for comparison operators is below.

{<field>: {$<comparison_operator>: <value>}}

We will see it working by modifying the last example done in the boolean operator.

db.restaurants.aggregate([{$match: {$and: [{type_of_food: "Caribbean"}, {rating: {$gte: 4, $lt: 6}}]}}])

Here we just modified the last query by getting it to return ‘type_of_food’ is ‘Caribbean’ and having a rating greater than or equal to 4 and less than 6.

This is all it for this article. Hope it helps everyone. Please drop your comments and suggestion in here.

--

--