On MongoDB Aggregation Pipelines

3 min readJan 27, 2024

This is the first part of a two-part series, where I start by introducing MongoDB’s aggregation pipelines and proceed to discuss a sample real-world use case, accompanied with a benchmark demonstrating its superiority to a naive approach. You can get to the second part here.

Aggregation pipelines are a powerful tool for interfacing with Mongo databases in a declarative and performant way. In most cases, they allow an equivalent level of expressiveness and flexibility to SQL queries.

An aggregation pipeline is always run on a collection. It is composed of a list of stages , where each stage takes as input the output of the previous stage, which is a list of documents. The initial stage takes as input the entire collection on which the pipeline is being run. The result of the pipeline is a cursor corresponding to the output of the last stage.

Let’s have a look at several pipeline stages. The ones that merely transform the input documents are the easiest:

$set
Adds new fields to documents. $set outputs documents that contain all existing fields from the input documents and newly added fields.

The$set stage takes as parameter an object, essentially a set of key-value pairs, and sets each key in the input documents to the given value. Here is an example:

// Given a database state:
[
  { _id: 0 },
  { _id: 1 }
]

// Running this pipeline on it:
[
  { $set: { foo: "bar" } }
]

// Will result in
[
  { _id: 0, foo: "bar" },
  { _id: 1, foo: "bar" }
]

You can also refer to a field of the object being processed:

// Running this pipeline on the same database:
[
  { $set: { foo: "$_id" } }
]

// Will result in
[
  { _id: 0, foo: 0 },
  { _id: 1, foo: 1}
]

You can also use aggregation expressions:

// Running this pipeline on the same database:
[
  { $set: { foo: {$add: ["$_id", 1] } } }
]

// Will result in
[
  { _id: 0, foo: 1},
  { _id: 1, foo: 2}
]

This allows you to express a wide variety of operations in a completely functional and declarative way, making them amenable to being optimized by the query engine while being easier to digest than SQL, at least in my opinion.

The $count stage always produces a single output document: the number of input documents:

// Running this pipeline on the same database:
[
  { $count: "myCount" }
]

// Will result in
[ { myCount: 2 } ]

The$documents stage ignores its input and outputs a given set of documents:

// Running this pipeline on the same database:
[
  { $documents: [
    { foo: "bar" },
    { buz: "fiz" },
  ]}
]

// Will result in
[
    { foo: "bar" },
    { buz: "fiz" },
]

This might be handy if you want to run the rest of your pipeline on a set of test documents without having to create a test collection containing them.

$unwind runs on an array and outputs one document per array item for each input document, where the array is replaced with one of its elements:

// Given a database state:
[
  { _id: 0, arr: [ 0, 1 ] },
  { _id: 1, arr: [ 2, 3 ] }
]

// Running this pipeline on it:
[
  { $unwind: "arr"}
]

// Will result in
[
  { _id: 0, arr: 0 },
  { _id: 0, arr: 1 },
  { _id: 1, arr: 2 },
  { _id: 1, arr: 3 },
]

Some other pipeline stages I find interesting include $lookup , corresponding to SQL joins, $facet , which lets you run a single set of documents through multiple sub-pipelines in parallel and collect the results together, $graphLookup , which lets you run recursive graph searches effortlessly, and $group, which lets you group together input documents according to a key or an expression and run aggregations on the resulting groups.

In the second part of this series I will talk about how you can design new aggregation pipelines to work with your data and how you might use them to run performant and reliable schema migrations.

On MongoDB Aggregation Pipelines

Written by Kartal Kaan Bozdogan