How to migrate PostgreSQL to MongoDB

Lars Kumbier
3 min readSep 29, 2017

--

In one of my clients projects, we are moving some parts from a monolithic PostgreSQL database into some microservices, which use MongoDB for their persistence. The whole migration can be done with native tools — psql and mongoimport to be precise — which results in a very high speed.

The Plan

MongoDB is a document-oriented database, which still has some introspective into the values. These can take up different basic types likes String, Integers or Booleans — and since the version 3.4 of MongoDB, the mongoimport-tool can import a CSV with specific markers to set the type of the field.

So, the plan is to create a postgres-query, which is saved as a Tab-Separated-Values (TSV) file and which is in turn imported using mongoimport.

Exporting the data from postgres…

This example uses a single table, but a more complex result is possible as well, where several tables are joined. Here is an example script:

Let’s follow the script. The outermost COPY (…) TO file in line 1 and 9 copies the output of the script into a file — make sure, that postgres has access to that location and the executing user has appropriate rights to use COPY. The options in line 11 instruct a CSV-output with a header-line, but uses a TAB character as delimiter — resulting in the wanted Tab-separated values.

The next layer is a simple SELECT … FROM a table — the magic happens inside.

Let’s look at line 3. I simply get the field “userid”, but rename it to a special fieldname in the format “fieldname.type()”. The fieldname will be the objectkey later imported into mongodb, while the auto()-type instructs mongoimport to just guess the type — usually this results in a string being detected.

The next line 4 will add a static string called ‘some-static-string’ with the string ‘__classname__’, again with the auto-type-detection. The next entry ‘failed_login_count’-field contains an integer, so we’ll inform mongoimport about that via the ‘int32()’ function call.

Now it get’s interesting. By default, postgresql-queries on boolean fields will result in a ‘t’ for true and an ‘f’ for false — which cannot be imported from mongoimport. So we make a conditional cast, which will output ‘TRUE’ or ‘FALSE’ — with the function suffix ‘boolean()’ to make the type known to mongodb.

But what about nesting structures in mongodb? Since mongodb is table-less and is a document-based db, we can have a JSON-key with an array-value to denormalize a normalized relational structure like the answers a user has given to some questions — a 1:n relation. This is done by lines 7 and 8: nested objects are given by nesting the field key like ‘answers.niceBlog.auto()’, where the last part is the already known type declaration.

…and importing via mongoimport

After running our export-script, we have a tsv-file with a header and lines with tab-separated values — one for each entry:

And now for the import, which — thanks to our preparations — is a breeze:

The important options are the headerline and columnsHaveTypes, which ensures the correct casting of the import types. And running a ‘db.users.findOne({})’ reveals one of the imported objects:

Performance

I’ve created a small testcase with 250.000 entries and ran the mongo-server-instance inside a docker container on my target platform. Timing the import, the whole job was finished in under 6 seconds.

Conclusion

Migrating data from postgresql — or any other sql server — is easy enough and very performant with native commandline tools. If you do need to have more control over the data format, you obviously will run into problems soon enough, where a migration script based on python or nodejs might be better suited than a big one-liner.

But for small data formats, the presented solution is easy enough.

--

--