MongoDB Aggregation Pipeline

Rushabh Dabhade
6 min readDec 30, 2023

--

The MongoDB Aggregation Pipeline allows you to perform a sequence of data processing operations on the documents in a collection.

The aggregation pipeline consists of a series of stages, where each stage represents a specific operation or transformation. The output of each stage serves as the input for the next stage, allowing you to create complex data processing workflows. The aggregation pipeline stages are processed in sequence, providing a flexible and expressive way to analyze and manipulate data.

Here are some common aggregation pipeline stages:

  1. $match: Filters the documents to pass only the documents that match the specified condition(s) to the next pipeline stage.
[ { $match : { author : "dave" } } ]

2. $project: Passes along the documents with the requested fields to the next stage in the pipeline.

{ $project: { contact: 1, } }

3. $group: The $group stage separates documents into groups according to a "group key". The output is one document for each unique group key.

[ { $group : { _id : "$item" } } ]

4. $sort: Sorts all input documents and returns them to the pipeline in sorted order. For the field or fields to sort by, set the sort order to 1 or -1 to specify an ascending or descending sort respectively.

[
{ $sort : { age : -1, posts: 1 } }
]

5. $lookup: Performs a left outer join with another collection, combining documents from both collections based on a specified condition.

{
$lookup:
{
from: <foreign collection>,
localField: <field from local collection's documents>,
foreignField: <field from foreign collection's documents>,
as: <output array field>
}
}

6. $addFields: Adds new fields to documents. $addFields outputs documents that contain all existing fields from the input documents and newly added fields.

{
$addFields: { "cats": 20 }
}

Here’s a simple example of an aggregation pipeline:

db.sales.aggregate([
{ $match: { status: "completed" } },
{ $group: { _id: "$product", totalSales: { $sum: "$amount" } } },
{ $sort: { totalSales: -1 } },
{ $limit: 5 }
]);

In this example, we’re aggregating sales data to find the total sales amount for each product, sorting the results in descending order, and limiting the output to the top 5 products.

The MongoDB Aggregation Pipeline is a versatile tool for performing complex data transformations and analytics within the database itself, reducing the need for multiple round-trips between the application and the database.

Examples:

We will see how aggregate pipeline works with some real-world examples.

Sample Data

//users

registered:
age:
tags:
_id:
index:
name:
favoriteFruit:
company:
phone:
location:
country:
address:
title:
email:
isActive:
gender:
eyeColor:
  1. How many users are active?
[
{
$match: { isActive: true }
},
{
$count: 'activeUsers'
}
]
  • We start with a group of users.
  • The first stage picks only the active users (isActive: true).
  • The second stage counts how many users are left after the filter.

2. What is the average age of all users?

[
{
$group: {
_id: null,
averageAge: {
$avg: "$age"
}
}
}
]

We can group data based on some fields like age. If we pass age, it will give results like how many different types of ages are there in our database.

Here we pass null, so it will group every document in a single document.

3. List the top 5 most common favorite fruits among users

[
{
$group: {
_id: "$favoriteFruit",
count: {
$sum: 1
}
},
},
{
$sort: {
count: -1
}
},
{
$limit: 5
}
]
  • We start with a group of users, grouped by their favorite fruits.
  • The $sum: 1 counts how many users have each distinct favorite fruit.
  • The $sort stage arranges these groups in descending order based on the count.
  • The $limit stage gives us only the top 5 results.

4. Total number of males and females

[
{
$group: {
_id: "$gender",
genderCount: {
$sum: 1
}
}
}
]

5. Which country has the highest number of registered users?

[
{
$group: {
_id: "$company.location.country",
userCount: {
$sum: 1
}
}
},
{
$sort: {
userCount: -1
}
},
{
$limit: 1
}
]
  • We start with a group of users, grouped by the country of their company’s location.
  • The $sum: 1 counts how many users are registered from each country.
  • The $sort stage arranges these groups in descending order based on the user count.
  • The $limit stage gives us only the top result, which is the country with the highest number of registered users.

6. What is the average number of tags per user?

[
{
$unwind: "$tags"
},
{
$group: {
_id: "$_id",
numberOfTags: {$sum: 1}
}
},
{
$group: {
_id: null,
averageNumberOfTags: {$avg: "$numberOfTags}
}
}
]
  • We start with a group of users, and the $unwind stage breaks down the tags array into separate documents. (Unwind will duplicate each document for each array value in an array.)
  • The first $group stage counts the number of tags for each user.
  • The second $group stage calculates the average number of tags across all users.

7. What are the names and ages of users who are inactive and have ‘velit’ as a tag?

[
{
$match: {
tags: "velit",
isActive: false,
}
},
{
$project: {
name: 1,
age: 1,
}
},
]
  • The $match stage filters out users who are not inactive and don't have 'velit' as a tag.
  • The $project stage shapes the output to include only the 'name' and 'age' fields.

8. Who has registered the most recently?

[
{
$sort: {
registered: -1
}
},
{
$limit: 4
},
{
$project: {
name: 1,
registered: 1,
}
}
]

9. Categorize users by their favorite fruit.

[
{
$group: {
_id: "$favoriteFruit",
users: { $push: "$name" }
}
}
]
  • The $group stage organizes users into groups based on their favorite fruit.
  • For each group, the users array is created, containing the names of users who share the same favorite fruit.

10. Find users who have both enim and id as their tags

[
{
$match: {
"tags" : { $all: ["enim", "id"] }
}
},
]

11. List all companies located in the USA with their corresponding user count.

[
{
$match: {
"company.location.country" : "USA"
}
},
{
$group: {
_id: null,
userCount: { $sum: 1 }
}
}
]

How $lookup works?

//books

_id:
title:
author_id:
genre:
//author

_id:
name:
birth_year:

This MongoDB Aggregation Pipeline involves the $lookup stage to join data from the "books" collection with the "authors" collection based on the author_id field.

[
{
$lookup: {
from: "authors",
localField: "author_id",
foreignField: "_id",
as: "author_details"
}
},
{
$addFields: {
author_details: {
$arrayElementAt: ["$author_details", 0]
}
}
}
]
  1. $lookup Stage:
  • The $lookup stage performs a left outer join with the "authors" collection.
  • The from: "authors" specifies the target collection for the join.
  • The localField: "author_id" indicates the field in the "books" collection that matches the foreign field.
  • The foreignField: "_id" specifies the field in the "authors" collection that corresponds to the local field.
  • The as: "author_details" creates a new field called "author_details" to store the joined author information.

2. $addFields Stage:

  • The $addFields stage is used to add or modify fields in the documents.
  • The author_details: { $arrayElementAt: ["$author_details", 0] } adds a new field called "author_details" and sets its value to the first element of the "author_details" array.

We dissected real-world scenarios, traversing user and book collections, unveiling how aggregation pipelines can breathe life into data relationships. Whether it’s tallying the number of active users, computing averages, or unveiling the most favored fruits, the aggregation pipeline emerges as a versatile artist’s brush, painting a detailed picture of your dataset.

Stay curious, keep coding, and explore the vast possibilities that MongoDB’s Aggregation Pipeline unfolds in your data-driven endeavors.

--

--

Rushabh Dabhade

Full Stack Developer | Building Beautiful Apps | Freelancer | Exploring the world of tech