MongoDB Joins Across Collections (with Ruby)
MongoDB is sweet, flexible and (when used correctly) enables insane productivity. Sadly, it does not support DB-side joins for standard calls. If you want to model your data in a traditional ‘relational’ way (which I posit is the best way, even with MongoDB) it can be irksome/error-prone to query joins on the app-side.
Some people consider this a downside of Mongo; I hope to show we can abstract that away with minimal DRY logic.
Normalized Models are Good; Joining is Good.
Imagine you have a users collection as well as a posts collection and a messages collection. Each one is the equivalent of a table in SQL-land, holding only its domain data and a foreign key — that is, each user has an _id, and each post and message have a user_id field. How can we now get a user with all of her posts and messages cleanly?
In SQL, this would be a ‘join’ query. Using ActiveRecord or whatnot, this might be expressed through the ORM, resulting in a single DB-side call. In MongoDB this is impossible, so we must make the ‘join’ on the app-side.
Since ‘joining’ across collections is a common pattern, we can and should DRY it with a common method. We will do this in Ruby-land, and discuss the implications later.
Below is a complete example which relies only on the native Mongo gem (version 2.1.2 in this case). You should be able to copy-paste this directly into irb and see it work. For instructional purposes, the below code includes the complete setup; to use it on your MongoDB collections, you naturally only need the function defined at the bottom.
We simply find the user by his id, assume the foreign key is “users” without the last char, and then find the relevant posts by user_id and likewise for messages. We then return the user, posts, messages.
This will result in n DB calls (per the number of collections) as the ‘join’ is made app-side. This is a performance hit. However, I posit that in many apps and most access patterns, the cost is negligible. Mongo is fast, humans are slow — optimize accordingly.
This generic access pattern can be used for ‘joining’ across any Mongo collections. Obviously here we assume the naming of collections and keys follow the users->user_id convention, and one might be advised to allow for some further configuration such as limits, custom foreign keys, criteria/projections, wiring to your ODM of choice, etc. You should also index the keys on which you query (in this case, user_id in posts and messages).
This is just minimal vanilla Ruby over the native MongoDB driver: it can be easily extended or modified to fit your exact flavors. However, for the general access pattern of joining across a foreign key, this pattern is very useful and DRY.
- As of Mongo 3.2, the $lookup operator actually allows using the aggregation pipeline to perform something akin to a DB-side join. This has pros and cons. In short, I advocate using app-side joins as above rather than relying on Mongo’s aggregation framework.
- The above is of course an alternative to holding denormalized models, which I find in practice to be error-prone; Cache invalidation is famously difficult.
MongoDB is awesome. By abstracting away its weaknesses, we can make it even more awesome. Understanding (and DRYing) client-side joins is an important step in this direction.