CHOOSING BETWEEN EMBEDDED DOCUMENTS AND REFERENCED DOCUMENTS IN MONGODB

Joshua Ajagbe
3 min readJul 1, 2022

--

Working with MongoDB or other no-SQL databases have lately become so prominent in the industry. Interfacing with data relationships is as well almost inevitable, hence the need to decide what approach scales or performs faster.

In such a condition, two options come to mind. Embedded documents and referenced documents. I have noticed over time that its often a tussle when its time to decide whether to harness embedded documents or referenced documents in MongoDB databases. MongoDB allows you to use schemas to define the fields and data types within your collections. Although schemas are optional, they are highly recommended to easily understand the format of your documents. Schemas also ensure you have high data quality by ensuring required fields are present, and all fields conform to their respective types.

Firstly, we need to understand what they both are, as well as the differences between the two aforementioned data schema strategies.

For embedded documents; they are documents that are stored inside parent documents as children. Which implies they are all stored in same collection as the parent document collection. This feels cool because, anytime you query the parent document, the children documents are as well supplied alongside. For instance, if you have a user document, with an embedded document of books read, the whole books documents would be returned by the query. By storing the books as embedded documents, we only get to persist a single collection. Which could be seen as an advantage.

Unlike embedded documents, referenced documents are stored in a separate collection relative to their parent documents. The id of the child document is hence stored in the parent document as an object id. Hence, it possible to retrieve the parent document without fetching any of its referenced documents. Just the ids of the referenced documents are returned.

When using the referenced documents approach, at least two collections are required. If we want to query the user alongside the books read, two queries would be made then, querying both collections. Firstly, to retrieve from the user’s collection and secondly to retrieve from the books collection.

As a matter of speed, it is best practice to use embedded documents when both the parent document and its related documents are either read or written at the same time. Furthermore, we should prioritise based on whether the collection is read or write intensive. If you’re writing a lot of data, without the need to read the entire parent document, use references. In general, and depending on your situation, the decision on whether to embed or reference will vary. Take note that embedded documents are best used when you intend to query both the parent child documents at once. While on the other side, referenced documents are optimal when only the parent document is required per API call, hence, wanting to retrieve the child document alongside the parent document would lead to two database queries per API call.

It’s your decision as an engineer to decide what approach suits your requirement per time. I hope this article would help in making the best decision in your organization.

--

--

Joshua Ajagbe

An experienced backend software engineer, with love for teaching and solving problems.