MongoDB Performance Optimization: Aggregation Pipelines
Lessons learned by querying millions of records with millisecond latencies and fine-tuning aggregation pipelines, indexes, and resource usage
MongoDB’s aggregation framework allows you to build powerful, server-side data processing pipelines — ideal for filtering, joining, grouping, sorting, and reshaping documents at scale. While it’s designed to minimize data transfer and reduce application-side processing, performance can degrade rapidly as pipelines grow in complexity or operate on large, unindexed collections.
Common bottlenecks include excessive memory usage, collection scans due to missing indexes, overuse of $unwind or $lookup, and inefficient stage ordering. These can lead to increased latency, high resource consumption, and even failed queries in production.
In this post, we’ll explore practical strategies to optimize aggregation pipelines in MongoDB with clear examples and performance rationale.
Best Practices for Writing Aggregation Pipelines
1 — Filter early with `$match
`
Reduce the number of documents as early as possible in the pipeline to minimize processing in later stages.
// 🚫 Inefficient - Placing $project or $group Before $match
// Example 1: Filtering after projecting
db.orders.aggregate([
{ $project: { customer: 1, total: 1, status: 1 } },
{ $match: { status: "shipped" } }
]);
// Example 2: Filtering after grouping
db.sales.aggregate([
{ $group: { _id: "$region", total: { $sum: "$amount" } } },
{ $match: { _id: "North America" } }
]);
/*
In both cases, unnecessary documents are processed before filtering -
wasting compute and memory resources.
*/
// ==========================================================
// ✅ Optimized - Place $match as early as possible
// Example 1: Match first, then project
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $project: { customer: 1, total: 1, _id: 0 } }
]);
// Example 2: Match first, then group
db.sales.aggregate([
{ $match: { region: "North America" } },
{ $group: { _id: "$region", total: { $sum: "$amount" } } }
]);
💡 Matching early reduces the number of documents passed to downstream stages, improving performance and memory efficiency.
2 — Use indexes effectively
Aggregation pipelines can leverage indexes — especially when $match
appears early in the pipeline.
// 🚫 Inefficient - No supporting index
// Using an unindexed field in an early $match causes a full collection scan.
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $sort: { created_at: -1 } } // 'status' and 'created_at' are not indexed
]);
/*
This pipeline triggers a COLLSCAN (collection scan), which increases
latency, consumes more memory, and doesn’t scale well with growing datasets.
*/
// ==========================================================
// ✅ Optimized - Create a compound index
db.orders.createIndex({ status: 1, created_at: -1 });
// Now the pipeline can efficiently filter and sort using the index:
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $sort: { created_at: -1 } }
]);
/*
This index allows MongoDB to efficiently filter and sort
in one pass using an IXSCAN.
*/
💡 Combine $match
and $sort
on indexed fields for optimal index use.
💡 When filtering ($match
) or sorting ($sort
), ensure indexed fields (e.g., status
, created_at
) appear early in the pipeline.
💡 Use .explain("executionStats")
to verify that the pipelines use the expected indexes:
db.orders.aggregate([...]).explain("executionStats");
⚠️ MongoDB only uses indexes at the start of the aggregation pipeline. Once a stage requires scanning all documents (e.g., $project
that reshapes keys), index usage stops.
3 — Profile aggregation pipelines with `.explain("executionStats")
`
Before shipping any aggregation pipeline to production, always verify:
- Whether indexes are being used
- Document scan count vs return count
- Which stages are contributing to latency or resource usage
db.collection.aggregate([...]).explain("executionStats");
💡 Prefer executionStats
over queryPlanner
for actual runtime metrics.
Example Pipeline Inspection:
Let’s say you have this pipeline on a transactions
collection:
db.transactions.aggregate([
{ $match: { status: "completed", created_at: { $gte: ISODate("2024-01-01") } } },
{ $sort: { created_at: -1 } },
{ $limit: 100 },
{ $lookup: {
from: "users",
localField: "user_id",
foreignField: "_id",
as: "user"
}},
{ $unwind: "$user" },
{ $project: {
_id: 0,
user_name: "$user.name",
amount: 1,
created_at: 1
}}
]).explain("executionStats");
What to Look for in .explain("executionStats")
:
(i) Analyze top-level execution stats:
executionTimeMillis
: Wall-clock time to run the pipelinetotalDocsExamined
: How many documents were scanned — ideally close to result count (nReturned
).totalKeysExamined
: Index keys scanned — ideally non-zero if you're using indexes.
{
"stages": [...],
"executionStats": {
"executionTimeMillis": 287,
"totalDocsExamined": 105432,
"totalKeysExamined": 0,
...
}
}
💡 If totalDocsExamined >> nReturned
, the pipeline is doing unnecessary work.
(ii) The $cursor
stage – check for index use
{
"$cursor": {
"queryPlanner": {
"winningPlan": {
"stage": "COLLSCAN", // 🚨 Red flag
"filter": { status: "completed", created_at: { $gte: ... } }
}
},
"executionStats": {
"nReturned": 100,
"executionTimeMillisEstimate": 200,
"totalDocsExamined": 105432
}
}
}
💡 If the plan uses "COLLSCAN"
(collection scan), you're not using an index — which is inefficient.
💡 If it uses "IXSCAN"
or "FETCH"
with "inputStage": "IXSCAN"
, you are using an index.
If your totalDocsExamined
is too high here, the fix could be a compound index that supports both $match
and $sort
:
db.transactions.createIndex({ status: 1, created_at: -1 });
(iii) The $lookup
stage – check join strategy
In .explain
, $lookup
stages show nested pipelines or access plans:
{
"$lookup": {
"from": "users",
"as": "user",
"localField": "user_id",
"foreignField": "_id",
"strategy": "IndexedLoopJoin", // ✅ Optimal
...
}
}
💡 "strategy": "IndexedLoopJoin"
means that the foreign collection has an index on _id
.
💡 If the strategy is "NestedLoopJoin"
and no index is mentioned, it's a full scan per document — a major red flag in production. In such cases, as a fix, ensure the foreign collection has an index:
db.users.createIndex({ _id: 1 });
(iv) Look for disk use warnings:
If your pipeline is heavy and you didn’t enable allowDiskUse
, .explain()
might show memory or buffer limits being hit. In such cases, as a fix, add:
db.collection.aggregate([...], { allowDiskUse: true });
4 — Regularly review with `explain()
` and `$indexStats
`
As your data volume, access patterns, and query filters evolve over time, it’s important to periodically review how well your indexes and aggregation pipelines are performing.
Use the following tools to audit performance and clean up inefficiencies:
// Check the current query plan and performance stats
db.collection.aggregate([...]).explain("executionStats");
// Review index usage statistics (e.g., how often each index is used) and identify unused indexes
db.collection.aggregate([{ $indexStats: {} }]);
💡 These tools help identify slow stages, unused indexes, and query shape inefficiencies.
5 — Bound your aggregations with `$limit
`
For dashboards, APIs, or any user-facing endpoints, always set an explicit limit to avoid processing excessive data.
// 🚫 Inefficient – Unbounded result set
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $sort: { created_at: -1 } }
]);
/*
Without a $limit, this pipeline may scan and sort
the entire matching dataset — which becomes
increasingly expensive as data grows.
*/
// ✅ Optimized - Add an explicit $limit
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $sort: { created_at: -1 } },
{ $limit: 100 }
]);
/*
Bounding the result size keeps memory usage predictable,
improves response times, and prevents expensive sorts.
*/
💡 Unbounded aggregations (especially $sort
without $limit
) can lead to high memory usage, slow responses, and increased pressure on disk I/O — especially in high-traffic environments.
6 — Prefer smaller document shapes
MongoDB loads full documents into memory for each stage. Reducing the size of each document can significantly improve performance and reduce resource consumption.
Ways to minimize memory usage:
- Use
$project
early to trim unnecessary fields - Avoid bloated
$group
outputs (e.g., don’t accumulate full documents unless required) - Do not embed large subdocuments unless they’re actually needed downstream
💡 Smaller document shapes mean faster processing, lower memory usage, and less I/O — especially critical for large-scale pipelines or resource-constrained clusters.
7 — Set `allowDiskUse: true
` for memory-heavy aggregations
Aggregation stages like $sort
, $group
, and $lookup
can consume large amounts of memory, especially when working with high-cardinality datasets or unbounded inputs.
To prevent failures during execution, enable disk spill:
db.collection.aggregate([...], { allowDiskUse: true });
⚠️ Without allowDiskUse
, large pipelines may fail with an ExceededMemoryLimit
error, especially when intermediate results exceed 100 MB (default in many MongoDB versions).
💡 Enabling disk use allows MongoDB to offload intermediate data to disk, trading memory pressure for slower but more reliable execution.
8 — Minimize `$group
` on large data sets
The $group
stage holds intermediate results in memory. Grouping entire documents (e.g., with $$ROOT
) can quickly exhaust memory and trigger disk spill or aggregation failure.
// 🚫 Inefficient – Grouping entire documents
db.sales.aggregate([
{
$group: {
_id: "$category",
data: { $push: "$$ROOT" } // Pushing full documents
}
}
]);
/*
This increases memory usage significantly,
especially if each document is large or the group set is wide.
*/
// ✅ Optimized - Only group necessary fields
db.sales.aggregate([
{
$group: {
_id: "$category",
total: { $sum: "$amount" } // Only what's needed
}
}
]);
/*
Push or compute only essential fields to keep memory footprint low.
*/
💡 Always minimize the data passed through $group
. Avoid pushing entire documents unless absolutely necessary to reduce memory usage, improve performance, and avoid disk spill.
9 — Avoid `$project: $$ROOT
` unless truly needed
Using $project: "$$ROOT"
adds unnecessary nesting, increases document size, and can prevent index optimization in later stages.
// 🚫 Inefficient – Wrapping the full document again
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $project: { data: "$$ROOT" } } // Redundant wrapping
]);
/*
This inflates each document by nesting the full original document
under the "data" field — increasing payload size and memory usage.
It also breaks potential optimizations that rely on predictable document shapes.
*/
// ✅ Optimized - Project only the required fields
db.orders.aggregate([
{ $match: { status: "shipped" } },
{ $project: { customer_id: 1, total: 1 } }
]);
/*
Keeps output lean, reduces memory footprint,
and helps downstream stages (and clients) consume less data.
*/
💡 Use $project: "$$ROOT"
only if you need to rename or nest the entire document for a very specific use case — otherwise, prefer explicit field selection.
10 — Avoid `$unwind
` when possible
$unwind
expands arrays into multiple documents, which can dramatically increase the size of intermediate results — especially on large collections or wide arrays.
// 🚫 Inefficient – Unwinding before matching
db.products.aggregate([
{ $unwind: "$tags" },
{ $match: { tags: "electronics" } }
]);
/*
This unnecessarily multiplies the number of documents
before filtering — wasting compute and memory.
*/
// ✅ Optimized – Match array values directly
db.products.aggregate([
{ $match: { tags: "electronics" } }
]);
/*
MongoDB supports direct matching on array fields,
avoiding the need to flatten documents with $unwind.
*/
💡 Use $unwind
only when you need to group, sort, or project individual elements within an array. For simple filtering, direct $match
is more efficient.
11 — Minimize `$unwind
` on large arrays
Unwinding large arrays can multiply document count exponentially, leading to high memory usage, slower queries, and increased I/O.
// 🚫 Inefficient – Unwinding large arrays early
// If each order has ~50 products, 10,000 orders become 500,000 documents after $unwind
db.orders.aggregate([
{ $unwind: "$products" },
{ $group: { _id: "$products.category", count: { $sum: 1 } } }
]);
/*
This forces MongoDB to create and process hundreds of
thousands of intermediate documents — even if most fields aren't needed.
*/
// ✅ Optimized – Use $project with $map before $unwind
// Reduces document size before the unwind, and in many cases,
// makes $unwind unnecessary altogether.
db.orders.aggregate([
{
$project: {
categories: {
$map: {
input: "$products",
as: "prod",
in: "$$prod.category"
}
}
}
},
{ $unwind: "$categories" },
{ $group: { _id: "$categories", count: { $sum: 1 } } }
]);
/*
Only the necessary data is passed forward. If no aggregation is needed,
consider skipping $unwind entirely and use $filter or $reduce.
*/
💡 Always consider array-aware expressions like $map
, $filter
, or $reduce
before reaching for $unwind
. Flattening should be a last resort when per-element processing is unavoidable.
12 — Avoid Unnecessary `$lookup
`
Unnecessary joins can slow down your pipeline, especially if the foreign collection is large or lacks indexes.
// 🚫 Inefficient – Joining the entire customers collection
// Fetches full documents even if only the name is needed.
// Can degrade performance if `customers._id` is not indexed.
db.orders.aggregate([
{
$lookup: {
from: "customers",
localField: "customer_id",
foreignField: "_id",
as: "customer"
}
},
{ $unwind: "$customer" }
]);
// ✅ Optimized – Use pipeline form with targeted projection
// Reduces memory usage, supports index use, and avoids fetching excess fields.
db.orders.aggregate([
{
$lookup: {
from: "customers",
let: { customerId: "$customer_id" },
pipeline: [
{
$match: {
$expr: { $eq: ["$_id", "$$customerId"] }
}
},
{
$project: { _id: 0, name: 1, email: 1 }
}
],
as: "customer"
}
},
{ $unwind: "$customer" }
]);
💡 Always project only the fields you need in $lookup
. Ensure join fields are indexed for performance.
13 — Avoid complex nested pipelines in `$lookup
`
Shallow pipelines scale better and can be cached, logged, and paginated independently.
// 🚫 Inefficient – Nested $lookup inside another $lookup
// Difficult to index, test, and debug. Slows down on large datasets.
db.orders.aggregate([
{
$lookup: {
from: "invoices",
let: { orderId: "$_id" },
pipeline: [
{
$match: {
$expr: { $eq: ["$order_id", "$$orderId"] }
}
},
{
$lookup: {
from: "payments",
let: { invoiceId: "$_id" },
pipeline: [
{
$match: {
$expr: { $eq: ["$invoice_id", "$$invoiceId"] }
}
},
{ $project: { amount: 1, method: 1 } }
],
as: "paymentInfo"
}
}
],
as: "invoiceDetails"
}
}
]);
// ✅ Optimized - Flatten or defer joins
// Option 1: Split into multiple queries and join in application logic
// Option 2: Denormalize stable data at write time
// First: fetch orders + invoice IDs
db.orders.aggregate([
{
$lookup: {
from: "invoices",
localField: "_id",
foreignField: "order_id",
as: "invoices"
}
}
]);
// Then: join invoices ↔ payments in a separate query if needed
// Or embed payment summary directly into invoices at write-time
// if the structure is stable.
14 — Use `$facet
` with care
$facet
allows multiple sub-pipelines to run in parallel — great for dashboards — but it comes at a memory cost.
// Use Case: Dashboard-style aggregation using $facet
db.orders.aggregate([
{
$facet: {
recent: [
{ $sort: { created_at: -1 } },
{ $limit: 5 }
],
byStatus: [
{ $group: { _id: "$status", count: { $sum: 1 } } }
]
}
}
]);
💡 Use $facet
for scenarios like dashboards or summaries where you need multiple perspectives in one query.
⚠️ Avoid it in nested or deeply recursive aggregations — it can consume significant memory and be hard to profile.
15 — Prefer stateless pipelines
Aggregation stages that depend on document ordering or global state (like $sort
, $group
, $first
) increase memory usage and limit parallelism.
// 🚫 Inefficient – Requires sorting and stateful grouping
db.events.aggregate([
{ $sort: { timestamp: 1 } },
{
$group: {
_id: "$user_id",
firstEvent: { $first: "$$ROOT" }
}
}
]);
// ✅ Optimized – Stateless transformation per document
db.events.aggregate([
{
$project: {
user_id: 1,
action_type: 1,
isSignup: { $eq: ["$action_type", "signup"] }
}
}
]);
💡 Stateless transformations can be parallelized, require less memory, and scale better across sharded clusters.
16 — Avoid deeply-nested or over-engineered pipelines
Keep pipelines shallow — fewer stages often mean faster execution.
Pipelines with more than ~10 stages or too much nesting can get hard to maintain, debug, and optimize (especially painful to trace bottlenecks during tuning).
💡 Break down logic where possible. Use materialized views via $merge
for repeated work.
17 — Avoid running expensive aggregations frequently in real-time
Running heavy aggregations on every dashboard/API request leads to performance spikes, especially on collections with millions of documents.
For example, real-time dashboard queries aggregate millions of records on every page load. These cause CPU and memory usage spikes on the database server.
💡 For frequent expensive aggregations:
- Cache results at the service or application level
- Use
$merge
to store pre-aggregated results in a report collection - Pre-aggregate results using background jobs or triggers
18 — Cache expensive aggregation results
If your aggregation takes seconds or more:
- Use
$merge
to store the result in a new collection - Cache results at the API/service layer (e.g., Redis or in-memory)
- Schedule batch jobs (e.g., every 10 minutes)
19 — Use `$merge
` for materialized views
When running heavy aggregations repeatedly, consider materializing the results by writing them to a dedicated collection using $merge
. MongoDB doesn’t have built-in materialized views, but $merge
provides similar functionality.
// Simple materialized view
db.sales.aggregate([
{ $match: { created_at: { $gte: ISODate("2024-01-01") } } },
{ $group: { _id: "$product_id", total: { $sum: "$amount" } } },
{ $merge: { into: "product_sales_summary", whenMatched: "replace" } }
]);
// Advanced example with incremental updates
db.transactions.aggregate([
{
$group: {
_id: {
date: { $dateToString: { format: "%Y-%m-%d", date: "$created_at" } },
product_id: "$product_id"
},
total_amount: { $sum: "$amount" },
txn_count: { $sum: 1 }
}
},
{
$merge: {
into: "daily_sales_summary",
whenMatched: "merge",
whenNotMatched: "insert"
}
}
]);
💡 Select an appropriate refreshing strategy when creating materialized views.
- Overwrite completely: Use
$merge
withwhenMatched: "replace"
if you want to overwrite summaries daily. - Incremental updates: Use
whenMatched: "merge"
to update only changed keys. - Idempotent merges: Ensure
_id
is correctly set in your$group
to allow safe upserts.
💡 Automate refreshing materialized views with cron jobs, serverless functions, or Change Streams to trigger updates on data change (for near real-time).
💡 Use TTL indexes to expire temporary views if needed:
db.daily_sales_summary.createIndex(
{ created_at: 1 },
{ expireAfterSeconds: 86400 } // 1 day
);
20 — Use change streams or triggers for real-time views
If your aggregation use case needs real-time updates, consider MongoDB Change Streams to incrementally update a summary or analytics collection rather than recalculating with each request.
21 — Pre-aggregate periodically using time buckets
For time-series or historical data (e.g., transactions, logs), avoid aggregating raw data on-demand. Instead, pre-compute summaries at desired intervals and store them separately.
db.transactions.aggregate([
{ $match: { created_at: { $gte: startOfDay, $lt: endOfDay } } },
{ $group: { _id: "$user_id", daily_total: { $sum: "$amount" } } },
{ $merge: "daily_summaries" }
]);
22 — Use `$bucket
` or `$bucketAuto
` for histogram-like grouping
When grouping by numeric ranges (e.g., age bands, transaction sizes), $bucket
simplifies and optimizes your aggregation compared to manual $match
+ $group
logic.
{
$bucket: {
groupBy: "$transaction_amount",
boundaries: [0, 100, 500, 1000, 5000],
default: "Other",
output: { count: { $sum: 1 } }
}
}
23 — Favor denormalization in high-read scenarios
MongoDB is optimized for denormalized data. If you frequently join static metadata in aggregations, consider embedding that data directly to reduce $lookup
overhead and improve cache efficiency.
{
order_id: "...",
customer_id: "...",
customer_snapshot: {
name: "...",
tier: "gold"
}
}
In distributed systems and multi-tenant apps, this reduces $lookup
reliance and increases cacheability.
24 — Watch for cursor timeouts in long-running pipelines
MongoDB cursors time out after 10 minutes of inactivity by default. For long or paginated aggregations, set a batch size and monitor for CursorNotFound
errors.
db.collection.aggregate([...], { cursor: { batchSize: 1000 } })
25 — Use aggregations in read replicas where possible
If you have a cluster setup, offload heavy read workloads from your primary node to read replicas. This pattern is very common for building analytics dashboards or reporting APIs using secondary indexes.
26 — Shard collections strategically in sharded clusters
If you’re on MongoDB sharded clusters or Atlas with sharding:
- Always place
$match
early, filtering on shard keys where possible - Missing shard keys leads to scatter-gather queries, severely impacting performance
💡 Designing your shard keys to match common query filters is critical for scalability.
27 — Avoid pipelines in hot paths unless necessary
Latency-sensitive workflows (e.g., login, checkout) should avoid expensive aggregation pipelines. Move these operations to pre-aggregated views or async jobs triggered via messaging queues.
28 — Be aware of pipeline memory limits
MongoDB limits memory usage for each aggregation stage:
- 100MB by default in-memory
- Use
allowDiskUse
cautiously - Watch for
$group
+$sort
stages spilling frequently
29 — Log and monitor aggregation metrics
Track and analyze aggregation performance metrics including:
- Latency per pipeline
- Failure rates
- Execution plan changes (due to plan cache updates or schema/index changes)
Integrate monitoring with:
- APM tools like Datadog, New Relic
- MongoDB Atlas Performance Advisor
- Custom slow query logging in your app
Conclusion
Optimizing MongoDB aggregation pipelines is key to building fast, scalable applications. By applying proper indexing, efficient pipeline design, caching, and monitoring, you can avoid common bottlenecks and ensure consistent performance — even at scale. Regular review and tuning will keep your data processing smooth as your workload grows. Keep refining, keep profiling — and your aggregations will scale with confidence.
Stay tuned for the next programming tip. Until then, happy hacking!