How to increase Mongo aggregation $lookup performance

Alfred Yang
finnovate.io
Published in
3 min readSep 18, 2022
Photo by Marten Newhall on Unsplash

Mongo aggregation is incredibly useful for running complex data queries. However carelessly written aggregations can lead to poor application performance, especially when $lookups are liberally used to join multiple collections. We find that poorly optimized $lookups are usually the cause of slow aggregations.

In this article, I will discuss ways to speed to Mongo aggregation when $lookups are involved.

Do we even need $lookup in the first place?

When defining the “schema” of a Mongo collection, I think it’s important to be mindful of the type of queries we will be running. Ideally, we want to structure our collections in such a way so that the most common types of queries can be executed without $lookups, or even aggregations in general.

For example, if we often look up cars by license plate, it may make sense to have a license plate object defined directly in the cars schema. To look up all cars with Californian plates, we can simply do:

db.cars.find({ licensePlate.state: 'California' });

Now, imagine if LicensePlate is another collection with ref to an individual car, then to perform the same search we have to do:

db.cars.aggregate([
{ $lookup: {
from: 'licensePlates',
localField: '_id',
foreignField: 'car',
as: 'licensePlate'
}},
{ $match: {
'licensePlate.state': 'California'
}}
]);

This is a much more expensive operation as for every car in the collection, we have to lookup and join its corresponding license plate.

Use $look up as late in the pipeline as possible

Imagine you have a collection with millions of cars and you need to fetch cars with their license plates attached. Also, your query needs to be paginated as it is impractical to store a massive amount of data in memory.

db.cars.aggregate([
{ $lookup: {
from: 'licensePlates',
localField: '_id',
foreignField: 'car',
as: 'licensePlate'
}},
{ $skip: 0 },
{ $limit: 100 } // returns first 100 cars in collection
]);

This is a huge waste of computation power, we you are first looking up millions of license plates and ditching all cars but the first 100. Instead, do:

db.cars.aggregate([
{ $skip: 0 },
{ $limit: 100 }, // returns first 100 cars in collection
{ $lookup: {
from: 'licensePlates',
localField: '_id',
foreignField: 'car',
as: 'licensePlate'
}}
]);

That way you are only looking up plates for 100 cars, instead of millions. The resulting array of cars is exactly the same.

When writing an aggregation, it’s important to be mindful of the properties in foreign collections we need filter and sort by. For foreign objects with no filterable and sortable properties, they can likely be $lookup’ed at the end of the aggregation pipeline when the input data set is much smaller.

Always $lookup using indexed fields

It’s not always possible to defer $lookups in an aggregation as you may be sorting and/or filtering based on a foreign field. In the above example, if we need to filter cars with plates from a specific state, then we cannot perform the $lookup after pagination as we would only be filtering the first 100 cars.

We have to do:

db.cars.aggregate([
{ $lookup: {
from: 'licensePlates',
localField: '_id',
foreignField: 'car',
as: 'licensePlate'
}},
{ $match: {
'licensePlate.state': 'California'
}},
{ $skip: 0 },
{ $limit: 100 } // returns first 100 cars with Cali plates
]);

In such case, we MUST index the licensePlate.car property:

db.licensePlates.createIndex( { car: 1 } );

This way, $lookup can very quickly find the license plate for each car because the licensePlate.car property is now sorted in Mongo. Without an indexed “car” property, $lookup would need to do a full collection scan to find the license plate for each car, which needless to say, is dramatically slower.

Note that the ‘_id’ property is automatically indexed in Mongo, so there is no need to index an ‘_id’ ever. If the foreign field in a $lookup is an ‘_id’ then you are good to go.

--

--

Alfred Yang
finnovate.io

Alfred is the founder of https://finnovate.io, a company that focuses on helping organizations build unique digital experiences on web, mobile and blockchain.