Can I Use MongoDB With Relational Data?

Joel Lord
MongoDB
Published in
5 min readAug 16, 2024

Let’s start by making something clear. Data is almost always inherently relational. What’s the point of having data if it doesn’t relate to anything else? So, the quick answer to this question is “yes.”

That’s it. That’s the article.

You may want more, though. The longer answer is that MongoDB can store relational data, and it does so in a slightly different, sometimes more effective way than traditional relational databases.

The traditional way

To illustrate this, let’s take a simplified example. Imagine an application that manages billing for an organization. It has customers and invoices. If, like me, you were trained with SQL databases, your mind probably already started thinking about splitting those into tables. A schema for this application would look something like this.

A database schema that has four tables to represent a customer invoice

Immediately, we can see the relationship between customers and invoices and between the invoices and the invoice items. To query an invoice, you would do something like:

SELECT
invoices.invoice_date AS invoice_date,
invoices.total AS total,
customers.name AS customer_name,
items.description AS line_description,
items.price AS line_price
FROM invoices
INNER JOIN invoice_lines ON invoices._id = invoice_lines.invoice_id
INNER JOIN items ON invoice_lines.item_id = items._id
INNER JOIN customers ON invoices.customer_id = customers._id
ORDER BY invoices._id;

And you would retrieve your invoice data.

| invoice_date | total | customer_name | line_description | line_price |
|--------------|-------|---------------|------------------|------------|
| 2024–07–23 | 7.4 | Customer | Item #1 | 4 |
| 2024–07–23 | 7.4 | Customer | Item #2 | 3.4 |

That works well. It’s an efficient way to store data to save disk space, as we avoid duplication. However, the query is not the easiest to write, and while the output is okay, our application will still need some work to transform it.

A different approach

Let’s examine how we can model this data for MongoDB (or any other document database). Our data has two main relationships.

  • Invoices and Items: For each invoice, we have multiple items. Those belong together. In most cases, you won’t query lines independently of invoices.
  • Customers and Invoices: Each invoice has a customer, but those customers are independent of the invoices. You might query by invoice or by customer.

In MongoDB, relationships can be created by referencing (just like in SQL) or embedding data.

Let’s look at our data model through that lens.

Since we want the ability to query the customers outside of the invoices, it makes sense to keep those two in separate collections (tables). However, our invoice lines belong to the invoices. They won’t be queried independently. Therefore, those are great candidates for embedding. As you query your invoices, you will immediately get your invoice lines.

This is what your model would look like.

A schema of the invoice data model represented in MongoDB, where the invoice lines are embedded

Our invoice lines are now embedded in the invoice collection. To query the data, you can use MongoDB’s $lookup aggregation stage.

db.invoices.aggregate([
{
$lookup: {
from: "customers",
localField: "customer_id",
foreignField: "_id",
as: "customer"
}
}
]);

And you would get the following data.

{
_id: ObjectId('669fb31628e1502e02744b08'),
customer_id: 1,
date: 2024-07-23T13:41:42.781Z,
total: 7.4,
invoice_lines: [
{
description: 'Item #1',
price: 4
},
{
description: 'Item #2',
price: 3.4
}
],
}

Your data is now ready to be used in your application. As you can see, the data stored in MongoDB is relational. We can still create relationships between entities by using $lookup (the equivalent of a JOIN), and we can also embed entities, such as invoice lines. Items that belong together can be stored together, so they are easily retrieved without needing to be joined across the tables. This will result in an increase in performance when you read from your database.

Taking it a step further

But wait, there’s more! Using various data modeling patterns, you can make this data schema even more efficient. You could, for example, use the extended reference pattern here. This would change your schema to embed some bits and pieces of the customer object directly into your invoice while keeping your more complex customer object separate from the invoices.

A schema representing the extended reference pattern, where the customer data is part of the invoice.

You can now query your invoices in a single find operation and retrieve all the data you need in a simple operation. Your customer collection would still be present and contain more information about your customers so you can query them when needed.

db.invoices.findOne({_id: 1});

What about data duplication?

Yes, we have data duplication. Is it an issue, though? In some systems, this would be the desired outcome. Think of this invoice. If the customer moves, we want to keep the original address, where the customer was billed, not the address from the customer object. In this case, that data duplication is helping us. You can still query the customer data using a $lookup and the `invoices.customer._id` field, but you don’t need to retrieve the entire customer object every time you read an invoice from your database.

This isn’t always the case, though, and sometimes you’ll need to do two updates to ensure that data is consistent. This is why you need to consider your workloads when building your data model with MongoDB.

Once you’ve identified your entities, you should consider which one of your operations will happen the most frequently. In this case, you can imagine that this application would be read-intensive. For every 1000 views of the invoice, we might only get one update to the customer, which must also be applied to the invoice. By duplicating the data in this case, we are making our application much more efficient by reducing the need to perform expensive JOIN/$lookup operations.

Key takeaways

Yes! MongoDB _can_ be used for relational data. It is, in fact, exceptionally well suited to handle large quantities of relational data stored meaningfully. By keeping your data together, you are getting a performance gain, and you also get a better developer experience. Your objects can be retrieved from your database in a single operation, and they are already shaped in a way your application can use.

Illustration with the MongoDB branding that represents data modeling.

If you want to learn more about data modeling with MongoDB, I strongly recommend the book MongoDB Data Modeling and Schema Design. It will guide you through creating the best possible data model for your MongoDB database. The MongoDB University also has a great learning path about data modeling. Finally, for deeper tutorials about MongoDB, check out the Developer Center, which is packed with technical resources to help you get started with MongoDB.

--

--

MongoDB
MongoDB

Published in MongoDB

MongoDB empowers innovators to create, transform, and disrupt industries by unleashing the power of software and data