MongoDB Basic Document Grouping

How to group entire MongoDB documents based on a certain field

Ophir Harpaz
3 min readMar 28, 2019

This post serves as a note-to-self; it details a technical problem I had and which took me forever to overcome, or so it felt. Frustration level was high and Twitter is my witness.

Once solved, I couldn’t let the solution be forgotten and so I decided to carve it in stone, AKA write it in a blog post.

The Query I Needed

As part of my work in Guardicore Labs, I needed to query a Mongo database. I wanted to do the following:

  1. fetch documents that match certain criteria;
  2. group them together based on another field.

I didn’t want to sum numbers, calculate an average or point out some maximum value. I wanted to group, and to group only.

As my case is related to the inner-workings of the Guardicore’s product, I shall use an analogous example.

Say we have a songs collection:

{'_id': 1, 'name': 'Your Song', 'singer': 'Elton John', 'score': 3}
{'_id': 2, 'name': 'Circle of Life', 'singer': 'Elton John',
'score': 3}
{'_id': 3, 'name': 'Sir Duke', 'singer': 'Stevie Wonder', 'score':
2}
{'_id': 4, 'name': 'Daniel', 'singer': 'Elton John', 'score': 2}
{'_id': 5, 'name': 'As', 'singer': 'Stevie Wonder', 'score': 3}

and a singers collection:

{'_id': 1, 'name': 'Elton John'}
{'_id': 2, 'name': 'Stevie Wonder'}
a singers collection

I wanted to fetch songs with score 3 and group them by singers. The final result should be:

[{'_id': 'Elton John', 
'songs_scored_3': [{'_id': 1, 'name': 'Your Song', 'singer':
'Elton John', 'score': 3}, {'_id': 2, 'name': 'Circle of Life',
'singer': 'Elton John', 'score': 3}],
{'_id': 'Stevie Wonder',
'songs_scored_3': [{'_id': 5, 'name': 'As', 'singer':
'Stevie Wonder', 'score': 3}]
}]

and the query I had to run:

songs.aggregate([
"$match": {"score": 3},
"$group": {"_id": "$singer",
"songs_scored_3": {"$addToSet": "$$ROOT"}}
])

The Query Explained

Grouping in Mongo is done as part of an “aggregation pipeline” — a series of operations performed on the data, one stage after the other.

The query above does the following:

  1. It defines an aggregation pipeline by calling aggregate with an array of stages (in our case, two of them).
  2. The first stage is $match which filters documents according to the provided conditions (in our case, score == 3).
  3. The second stage is $group. By specifying the_id we tell Mongo which field(s) to group on. Then, we can (optionally) define the rest of the fields for our results. In the query above, we tell Mongo: “set a new field named songs_scored_3. As for the value, create a set of all the relevant documents in the pipeline”. In other words, every song-document being processed should be added to the set of its corresponding singer’s group.

Note: $$ROOT is used to reference the object which is currently being processed in the aggregation stage, as mentioned in the documentation.

💡 Use $push instead of $addToSet if you’re interested in preserving repeating documents.

That’s basically it.

References

Here is the StackOverflow question which saved me a lot of time by introducing $$ROOT:

It would have also helped me a lot if I had stumbled upon this question on how to push all of the document’s fields while grouping. Well, I guess one’s Google searches are always smarter in hindsight.

--

--

Ophir Harpaz

@ophirharpaz on Twitter. Security researcher at Guardicore. Reverse engineering enthusiast. Author of https://begin.re.