How to Fetch Data from Different Collections via MongoDB Aggregation Operations— (Explained With Examples)

Partha Roy
Fasal Engineering
Published in
6 min readMar 8, 2021

A few days back, I was talking to some of my dev friends and colleagues and it appeared to me that most of us misunderstood the aggregation stages and operators in MongoDB. So I thought of sharing my learning and knowledge on this subject as I strongly feel that in order to scale systems and increase the performances of applications, database query optimization is an integral part.

In this article, we will learn about some of the most important and used aggregation techniques in MongoDB. You can bookmark this article and it can be very handy whenever you need some assistance.

✅ Also checkout, Top 20+ Must-Have Modern Desk Accessories to get in India

Okay, enough, let’s start.

What is Aggregation in MongoDB?

Aggregation — as the literal meaning suggests it involves combining various things, similarly in MongoDb aggregation is a technique to query data from multiple collections by grouping or joining them followed by performing a variety of operations ( explained later in this doc) and then returning computed results.

Let’s take an example of a find query,

Collection.find(query, projection);
// query - query for filtering out the data
// projection - Specifies the fields to return in the documents that match the query filter

Now check the format of an aggregate query,

Collection.aggregate(pipeline, options)
// pipeline -
A sequence of data aggregation operations or stages.
// options -
An optional configuration applicable for multiple operations in pipeline and it's used to optimize or setting rules for executing the query, it doesn't do any data processing

So as you can see the aggregate query is much advanced than the find query and can perform data manipulation operations on the filtered data as well, unlike the find query where one is limited to only data retrieval.

Aggregation Pipeline Stages

In every aggregation query, the pipeline stated appears in the form of an array, in which all the operations are placed sequentially. These are called Aggregation Pipeline Stages, in this article, we will be referring to those as stages.

Various Aggregation Query Options

We will be discussing all the various operators available in the aggregation query. Below is the diagram of the simple database on which we will perform all the queries,

Collections and Data Structure that we will use for all the examples

$match

This operation is for reading data from a collection based on a matching expression similar to the find() query. The only difference is here the filtrations happen via $expr query expression ( {$expr: {$eq: [‘_id’, ‘123' ] } } ) rather than raw expressions ( { _id: ‘123’ } ) like in find method.

So in order to get the record of the user with id 123 from the User collection, below would be the aggregation query,

User.aggregate([{ $match: {$expr: {$eq: [‘ID’, ‘123' ] } }]);

$eq is the matching operator here, which matches the ID. Similarly, we can use $ne to select documents of which the value of keys isn’t the same as the specified value.

$lookup

Lookup is a stage that performs a left outer join with another collection and filters data from joined documents.

So in our case if we want to get all the orders with user names, then we can use the below query,

Orders.aggregate([{ 
$lookup: {
from: 'User',
localField: 'User_ID',
foreignField: 'ID',
as: 'Customer'
}}]);

Now in this query the first parameter ‘from’ specifies the collection which is to be joined with the current one, then the ‘localField’ specifies the key existing in the current collection which is to be matched with the foreign key in the other collection via ‘foreignField’. Here, User_ID from Orders and ID from Users are matched and joined to get the data. In the end ‘as’ is used as an alias to the data.

One point to be noted here is, in each record of output from lookup, data from the joined table comes inside an array as the first element, each one row from the output data of the above query will look like,

{ ID: String, Quantity: Number, User_ID: String, Customer: Array, Item_ID: String, Ordered_On: Date }

Here in the Customer array, the first element will be the data we have joined.

pipeline

In order to join collections with specific conditions which require expressions, we use pipeline along with $lookup.

So if we want to get all users with their orders which has quantity more than 5,

User.aggregate([{ 
$lookup: {
from: 'Orders',
let: {ID: '$ID'},
pipeline: [{$match: {
$expr: {
$eq: [
'$User_ID', '$$ID'
],
$gt: [
'$quantity', 5
]
}}}]
as: 'OrdersData'
}}]);

So by using pipeline inside a $lookup we can easily join tables by any condition.

$unwind

My personal favorite, $unwind operator flatten out any array and attaches that to the output document. So the below query ( same as the one we used in $lookup ),

Orders.aggregate([{ 
$lookup: {
from: 'User',
localField: 'User_ID',
foreignField: 'ID',
as: 'Customer'
}},
{
$unwind: '$Customer'
}
]);

will result in

{ ID: String, Quantity: Number, User_ID: String, Customer: Object, Item_ID: String, Ordered_On: Date }

The fundamental difference here is $unwind converts the array containing one element to a flatten-out object that is the element itself, with the same name.

// without $unwind
output_data.Customer[0].Name
// With $unwind
output_data.Customer.Name

$addFields

Now many will think what if I want just the Customer. Name but at the root level of the document so that we can access it like below,

output_data.Customer_Name

Unfortunately, $unwind can’t do this :/

There comes the superhero called, $addFields, by using this particular stage we can assign any data from inside of an object/array to the root level of the document.

So, in order to get the Customer Name like like mentioned above, we can write something like —

Orders.aggregate([{ 
$lookup: {
from: 'User',
localField: 'User_ID',
foreignField: 'ID',
as: 'Customer'
}},
{
$unwind: '$Customer'
},
{
$addFields: {
"Customer_Name": "$Customer.Name",
}
}
]);

$project

Project is basically a filter stage in aggregation queries, it tells which are the fields to be kept in the final document.

To give an example, in the previous example we have seen how can we get a flatten Customer_Name outside of the Customer object in the resulting datasets, however, the Customer property will still be there.

As we have already got the Customer_Name, the Customer object isn’t required anymore, so we have to remove it, and we will be keeping only Customer Name, Ordered Quantity, and Item_ID in the output. We can do that using $project,

Orders.aggregate([{ 
$lookup: {
from: 'User',
localField: 'User_ID',
foreignField: 'ID',
as: 'Customer'
}},
{
$unwind: '$Customer'
},
{
$addFields: {
"Customer_Name": "$Customer.Name",
}
},
{
$project: {
Quantity: 1,
Customer_Name: 1,
Item_ID: 1
}
}
]);

Now the above query will give us output like,

{ID: String, Quantity: Number, Customer_Name: String, Item_ID: String}

Joining More than 2 Collections using MongoDB Aggregation

Now there’s Item_ID in there in the output data, which isn’t quite readable and if it were the name of the Item, then perceiving that data would have been much easier. Let’s do that,

Orders.aggregate([{ 
$lookup: {
from: 'User',
localField: 'User_ID',
foreignField: 'ID',
as: 'Customer'
}},
{
$unwind: '$Customer'
},
$lookup: {
from: 'Items',
localField: 'Item_ID',
foreignField: 'ID',
as: 'Item'
}},
{
$addFields: {
"Customer_Name": "$Customer.Name",
"Item_Name": "$Item.Name"
}
},
{
$project: {
Quantity: 1,
Customer_Name: 1,
Item_Name: 1
}
}
]);

Below is a MongoDB Aggregation cheat sheet that can come in very handy.

Alright, we have learned some of the most important operations and stages for MongoDB Aggregations, feel free to ask any question in the comments section, I will be very happy to answer those.

Also read —

--

--

Partha Roy
Fasal Engineering

Full-Stack Engineer 👨🏻‍💻 | ReactJs Dev ⚙️ | Tech Mentor 👨🏻‍🏫