Why we don’t denormalize anymore

In this blog post we’ll take a closer look at the structuring of data inside the database.

We look into the transition from relational databases to NoSQL databases (MongoDB in this case) and especially the term “denormalization” which is discussed as the holy grail when talking about dealing with relational data in combination with NoSQL databases.

The start of web development

As many other developers I have worked with relational database a lot. Especially when PHP and MySQL were the tools of choice when creating a new web app back in the early 2000s.

But this changed with the introduction of NoSQL databases such as MongoDB or CouchDB. Developers were not forced to think in relational structures anymore. One could simply store all the data in one large document and retrieve / modify it as he wants. No more table schemas or data which needs to be updated. Just save the data in one document and you’re done. Great!

But how do you deal with the data relationship you already know from your relational databases?

Denormalization as the answer

This question was heavily discussed by different developers (here’s an example).

There is no correct answer to this question but one approach is the denormalization of data:

// Example 1: DENORMALIZED<br />
var user = {<br />
_id: '1',<br />
username: 'jdoe',<br />
firstName: 'John',<br />
lastName: 'Doe'<br />
}</p>
<p>var post = {<br />
_id: '1',<br />
title: 'Lorem ipsum',<br />
body: 'Lorem ipsum dolor sit met'<br />
author: {<br />
firstName: 'John'<br />
lastName: 'Doe'<br />
}<br />
}<br />
// Example 2: NORMALIZED<br />
var user = {<br />
_id: '1',<br />
username: 'jdoe',<br />
firstName: 'John',<br />
lastName: 'Doe'<br />
}</p>
<p>var post = {<br />
_id: '1',<br />
title: 'Lorem ipsum',<br />
body: 'Lorem ipsum dolor sit met'<br />
authorId: '1'<br />
}<br />

In these two examples you’ll see denormalization and normalization in direct comparison.

The second (normalized) example shows a way how you can normalization your data and store it the way you already know from your other relational databases. When fetching the post you can get the author by looking at the users collection and find him by the posts authorId.

The second (denormalized) example doesn’t need a second lookup in the users collection. You can just fetch the post and have the author embedded in the document.

This is great as it will reduce the load on the database and makes it easier to render the data because you don’t have to think about and deal with joins!

Up- and downsides of denormalization

Let’s talk about some up- and downsides of denormalization.

Upsides

With denormalization you don’t need any kind of joins to get all the data you need. You just fetch the document and you’re done. This reduces the load on the database and the lookup time for your data.
 Denormalization is the “community approved standard” to deal with related data in NoSQL systems.

Downsides

How do you deal with the updating of data? You need to update each and every relevant document where you have stored your denormalized data. Additionally your document structure get’s confusing very fast (“documentception”).
 What if you want to display other data than the one you’ve stored alongside the document (e.g. you want to display the users username rather than the firstName and lastName of the example above)?

Why we don’t do denormalization anymore

We’ve started with the denormalization approach as it was embraced and recommended by many developers who have dealt with NoSQL databases in the past.
 In the early days there were no easy ways to join the data in Meteor so it was a great idea to do denormalization.

One of our biggest projects so far still uses denormalization (for some minor parts).
 But it’s a mess.
 Every time something changes we need to update all related documents so that everything is in sync.

Recently we rewrote a core part of this application.
 But this time we switched from denormalization to normalization.
 We used the publish composite package in this rewrite and abandoned the denormalization approach because of the above mentioned downsides.
 There are some discussion out there about the performance of this package. But for us it serves pretty well. Even with high load on our servers.

The best of both worlds

Max Savin mentioned on crater that you could do both (embed the the referenced documentId in the main document). This gives you the easiness of denormalization with the flexibility of normalization.

Here’s an example with the post and user collections:

// Example 3: NORMALIZED AND DENORMALIZED<br />
var user = {<br />
_id: '1',<br />
username: 'jdoe',<br />
firstName: 'John',<br />
lastName: 'Doe'<br />
}</p>
<p>var post = {<br />
_id: '1',<br />
title: 'Lorem ipsum',<br />
body: 'Lorem ipsum dolor sit met'<br />
authorId: '1'<br />
author: {<br />
firstName: 'John',<br />
lastName: 'Doe'<br />
}<br />
}<br />

Conclusion

Denormalization seems like a great way to deal with relational data in NoSQL databases. It’s easy for the developer to use this pattern but it fires back when you need to update the denormalized data or want to fetch different data.

We’ve switched to the publish composite package and use it nearly every time we deal with relational data. We don’t do denormalization anymore as it’s too hard to keep the data in sync.

If you want to stick to the denormalized approach you should check out this package by Max Nowack which will help you keeping the denormalized data in sync.

Of course denormalization is not the silver bullet. There are some scenarios where denormalization is great! You have to evaluate whether a normalized or a denormalized structure is better for your application.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.