MongoDB Basic Document Grouping
How to group entire MongoDB documents based on a certain field
This post serves as a note-to-self; it details a technical problem I had and which took me forever to overcome, or so it felt. Frustration level was high and Twitter is my witness.
Once solved, I couldn’t let the solution be forgotten and so I decided to carve it in stone, AKA write it in a blog post.
The Query I Needed
As part of my work in Guardicore Labs, I needed to query a Mongo database. I wanted to do the following:
- fetch documents that match certain criteria;
- group them together based on another field.
I didn’t want to sum numbers, calculate an average or point out some maximum value. I wanted to group, and to group only.
As my case is related to the inner-workings of the Guardicore’s product, I shall use an analogous example.
Say we have a songs collection:
{'_id': 1, 'name': 'Your Song', 'singer': 'Elton John', 'score': 3}
{'_id': 2, 'name': 'Circle of Life', 'singer': 'Elton John',
'score': 3}
{'_id': 3, 'name': 'Sir Duke', 'singer': 'Stevie Wonder', 'score':
2}
{'_id': 4, 'name': 'Daniel', 'singer': 'Elton John', 'score': 2}
{'_id': 5, 'name': 'As', 'singer': 'Stevie Wonder', 'score': 3}
and a singers collection:
{'_id': 1, 'name': 'Elton John'}
{'_id': 2, 'name': 'Stevie Wonder'}
I wanted to fetch songs with score 3 and group them by singers. The final result should be:
[{'_id': 'Elton John',
'songs_scored_3': [{'_id': 1, 'name': 'Your Song', 'singer':
'Elton John', 'score': 3}, {'_id': 2, 'name': 'Circle of Life',
'singer': 'Elton John', 'score': 3}],
{'_id': 'Stevie Wonder',
'songs_scored_3': [{'_id': 5, 'name': 'As', 'singer':
'Stevie Wonder', 'score': 3}]
}]
and the query I had to run:
songs.aggregate([
"$match": {"score": 3},
"$group": {"_id": "$singer",
"songs_scored_3": {"$addToSet": "$$ROOT"}}
])
The Query Explained
Grouping in Mongo is done as part of an “aggregation pipeline” — a series of operations performed on the data, one stage after the other.
The query above does the following:
- It defines an aggregation pipeline by calling
aggregate
with an array of stages (in our case, two of them). - The first stage is
$match
which filters documents according to the provided conditions (in our case,score == 3
). - The second stage is
$group
. By specifying the_id
we tell Mongo which field(s) to group on. Then, we can (optionally) define the rest of the fields for our results. In the query above, we tell Mongo: “set a new field named songs_scored_3. As for the value, create a set of all the relevant documents in the pipeline”. In other words, every song-document being processed should be added to the set of its corresponding singer’s group.
Note:
$$ROOT
is used to reference the object which is currently being processed in the aggregation stage, as mentioned in the documentation.
💡 Use $push
instead of $addToSet
if you’re interested in preserving repeating documents.
That’s basically it.
References
Here is the StackOverflow question which saved me a lot of time by introducing $$ROOT
:
It would have also helped me a lot if I had stumbled upon this question on how to push all of the document’s fields while grouping. Well, I guess one’s Google searches are always smarter in hindsight.