Partoo migrates from MongoDB to PostgreSQL

Published in

partoo

6 min readSep 29, 2021

Why are we talking about it right now?

First, because we’re close to the end of this big project, and we’d like to take a step back and review all the challenges we faced and all the effort we deployed.

Then also because we’ll enjoy sharing the technical difficulties and solutions we came up with to tackle them, and the lessons we learned along the way.

This article is the first of a series in which we’ll dig into details, and we’ll first present the rationale behind this migration, the high level strategy of the whole project.

Rationale behind this project

Relational complexity

This is a story that repeats over and over. I myself read some years ago half a dozen similar posts explaining the same kind of migration back from MongoDB to relational databases.

Most of the time, the main reason is that when an application is growing, there are more and more objects and relationships between them.

Simply put, Partoo falls into this case as well.

A very common example is the refinement of Access Control (determine which users have access to which objects in the database). At one point you end up with a very granular “this object can be accessed by this list of users” and “this user has access to this list of objects”, this is what we call a “many to many” relationship.

I’m not saying that it’s impossible to handle even many to many relationships in MongoDB, but most of the time, as it’s not built-in, you end up with something that is not fast enough, not consistent and/or a nightmare to maintain.

Operational complexity

For performance reasons, we don’t use the MongoDB managed service from AWS and have to operate our own servers and back-ups.

This induces an extra amount of effort, especially when it comes to Datacenter migration: we’ve migrated our application to Europe last spring and we had to deal with the migration of the data by ourselves.

This is some extra effort we’d have liked to spare.

Technical debt

Let’s face it, technical debt is everywhere, Partoo makes no exception.

When a recruiter tells you that there’s no technical debt in the company, run away! Not only will there be some, but they are probably blind to it and not actively working on paying it out. I know it’s due to a personal affinity and not every developer is as eager as I am regarding technical debt redemption, but knowing upfront that my mission would be to tackle such projects and it was one of the key drivers for me to join Partoo.

So what form does this technical debt take in this context?

Here, we’re talking about a fork that was made of the main ORM for MongoDB: mongoengine.

A fork means that you take the code at a certain point in time, in order to modify it according to your yet non covered needs.

And you take ownership of this new code repository: from that point on, it’s up to backport all changes on the mainstream repository to continue benefiting from the new features, bug fixes and security upgrades, and as well in our case, the compatibility with new versions of MongoDB.

A common mistake that is made, and we did, when you fork a library such as mongoengine, is that the line drawn between your application code and the library’s code is blurred, and you’d easily think that some components, in our case some data types, should be put into the library as they are technically linked to the library, however this will increase the chances of conflicts between the fork and the mainstream repository, hence the cost of pulling the changes over and over. And that’s how you end up with immutable repositories becoming a burden for the application.

Lessons learned and to spread out: don’t ever fork an Open Source project to include your needs without the intent to submit them and merge them back to the mainstream repository! Take some time to start discussing with project maintainers, explaining your needs, and if they are valid, i.e. the use case could exist for other projects and there is not yet a way to fulfill your needs, they might accept directly your contribution, and you’ll be avoiding all kinds of dirty costs.

Strategy to migrate from MongoDB to relational database

Let’s not go into too much detail right now as it deserves a dedicated post. I’m just going to explain some very high level directives to run at the whole project level:

Start putting in place infrastructure, framework and processes so that all new feature developments can be done directly using the new database server. This is obviously crucial as otherwise, feature developments would continue to pile up new collections that would later have to be migrated and the effort is doubled.
Understand the whole data model. I will come back to this in another post, but understanding how the collections interact with each other is crucial, and I made a script that will generate an UML diagram showing the links between the collections in MongoDB and tables in PostgreSQL, plugged into our CI, so that even with changes done by new feature developments, we’d have a comprehensive overview of what still needs to be done and what would be the risks involved
Quickly migrate all small collections with limited impact, ideally most of them can be migrated within a deployment because they are not subject to much changes, do not have a large footprint in the code base, and often are not subject to much relationships.
For these small entities, generally the migration is simple: create a new data model in SQL, stop the write operations on MongoDB, transfer the data, switch read and write operations to SQL.
From this point on, the collections remaining to migrate will be the ones central and critical to the application, have a large footprint in the code base, will be subject to model changes, have some significant amount of data, and therefore will not be changeable in one-off migrations. They will require some specific strategies that will have a dedicated article. Overall, it consists in designing the new data model, have dual writing operations on both databases (shadow), transfer existing data, validate and monitor data synchronization, switch little by little read operations to SQL as main, then switch as well write operations to SQL as main keeping MongoDB in shadow for monitoring, and, when everything is migrated and validate, shutdown the shadow writes.
Alongside with step 4, you can now transform all relations you had to maintain on the applicative side to be built-in foreign keys and constraints. This will lighten the code base and simplify the understanding of the data model as a whole.
Last but not least, clean all code and dependencies to the old MongoDB server and shut it down, you deserve a fresh beer🍺 ! Enjoy!

Should we regret previous choices?

Here’s a legitimate question some could ask. Taking a step back at the state of the application now, the hard work deployed to migrate from MongoDB to PostgreSQL, all return of experience we could find since then, we could wonder if we made the right decision starting off with MongoDB.

Because all start ups begin with an MVP, it’s clear that prototyping an application with MongoDB is way faster than using less flexible technologies such as relational databases.

Partoo was founded in 2014. At that time, MongoDB was trendy, and not much return of experience of other companies would be found that would help anticipate the growing complexity leading to this migration.

Also, using trendy technologies is a key driver for recruitment when in a competitive market.

Well, the answer is: “no regret, MongoDB was the right choice for a fast start, migrating to PostgreSQL is the right approach to stabilise the application for a more sustainable growth”.