Migrate your MongoDB to PostgreSQL.
TL’DR: Go down to the ‘How to Migrate’ section for instructions.
Hello. Today I will be talking about a major pain in the a** of developers and dev-ops, which is migrating from a non-relational database (MongoDB) to a relational one (PostgreSQL).
The question is… how do you migrate a set of data that has no rules applied, formatted as JSON files to a very opinionated database engine? On the one hand, you have the guy that when you ask him anything, he will say, “Yeah mate, that’s fine. Go ahead.” And on the other end of the spectrum we have the guy that when you write down ‘A’, he replies, “That should be a lowercase a.”
MongoDB is your buddy. PostgreSQL (for which I will refer to as postgres from now on, by the way.) is your boss. Now, as much as it’s nice to get that freedom to do as you please, it may not always be a good thing. There needs to be some rules applied, there needs to be some conventions to follow, otherwise, in a few months time, you won’t remember what is what, and your database will be a mess. In that sense, postgres is a bless that keeps all your problems away. Just remember a few basic rules such as snake_case everything, keep names in the singular form instead of plural and avoid redundant SQL code such as unnecessary quoting and parenthesis, plus stating the schema of tables if you only have a single schema.
One of my teammates suggested for us to use this guide while creating our PostgreSQL database. And it was a blessing. (https://www.sqlstyle.guide/)
Now let’s get down to your options. You can:
A. Spend your and your company’s precious time as a software developer, writing a silly script to copy all that data from MongoDB to postgres.
Or you can:
B. Use a ready-made tool that does what you are trying to code better, faster and more efficiently.
I think that the way I phrased option A says enough about my opinion on it, but I will take a moment to have a look at it. There are definitely ways to write a script that goes through your MongoDB collections, then correlates that data into tables (by collections), then sorts out recurring JSON [keys] as columns of those tables and then, sorts out JSON :[values] into their respective [keys] (aka columns).
But take a harder look at that solution. What are you going to do once you reach unreliable data? (It’s not a question of IF, but WHEN). What to do once you reach a collection where you have an array of 6 objects, and on the next entry that array only has 5? Do you ignore the 6th object in the first array? Do you ignore the second array? Do you input the 6th object of the second array as a null value based off the extra key in the first array? Do you create an extra table with all the keys that you couldn’t attribute to anything?
More over, what if in one array the object’s value is a boolean and the next array has the same object but with a string value? How do you determine the column type for postgres?
See how complex and difficult this has become within two paragraphs of text? Are you going to spend one to two weeks of your time creating all that code with all those sets of rules, if statements, type comparisons and whatnot?
You can. I’m just saying you shouldn’t.
Now if the previous section just scared away from migrating database engines, then good. You don’t have to migrate. You can get someone else to do it for you :D
Through no endorsement or personal gain whatsoever, I want to let the world know about this marvelous tool called Stitch. What does it do? It specializes in data pipelines and data processing. In layman’s terms, it migrates data. Which is exactly what we are after.
And for those wondering, Stitch doesn’t only work for MongoDB and PostgreSQL. If you are having trouble with other engines such as Aurora, MySQL, MariaDB, you name it… those guys deal with it. And they can access those databases from a lot of different sources, such as AWS RDS, Google Cloud Platform and IBM B2. You can check the full list of sources here. (Hint, check ‘destinations’ as well.)
How to Migrate:
Now, migrating is extremely simple.
- First off, sign up for an account. If you know you only need to do a one-time migration, and keeping data updated in two databases at once is not a worry of yours, then I suggest you do their 14-day trial. It’s a very good option for this use case. (They have a free version, but that does not support MongoDB.)
- Once you have the account set up, the next step is to add a Source. That is where the data will be coming from. So, in the source, set up your MongoDB database, specify the IP of your database, type in your username and password, and very important: check if your database requires IP whitelisting. If yes, Stitch gives you the 4 IPs it requires to have whitelisted on the setup’s description.
- The next step required, is to add a Destination. That is where Stitch will migrate the source data to. The setup works pretty much in the same way as the source setup. You need to specify the IP of your database, username and password and remember… check if your database requires IP whitelisting.
Once that is done, the process is pretty automatic. Based off the “Replication Frequency” you setup with your Source, Stitch will sync the databases. So if you put in a frequency of 5 minutes, every 5 minutes Stitch will grab data from the Source and send it to the Destination.
Have a look around their dashboard and see the features that you manage to discover. You will see that if you click on the source, you will see a dashboard with menu items called ‘Summary’, ‘Collections to Replicate’, ‘Extraction Logs’, ‘Loading Reports’ and ‘Settings’.
To select what collections you want migrated, just go to ‘Collections to Replicate’ and select what you need from your MongoDB to be transferred over to postgres. If my memory doesn’t fail me, that ‘Collections to Replicate’ starts off with nothing selected, so it won’t replicate any data to the Destination until you specify what you want.
Now you are all set, and you can get to migrating whatever you need to. Much easier to spend 5 minutes on an online tool than doing it all by hand for over a week isn’t it?
Let me know your experiences with database migration on the comments and what you think about my approach.