Batch inserts in PostgreSQL

Valentin Mouret
3 min readJul 6, 2022

--

PostgreSQL is a great database. It has many features and it can be used in an interactive and safe way. Interactivity is great to go fast and build higher quality software, while safety limits the mistakes one can make and reduce stress. Note that the combination of the two is especially powerful!

However, PostgreSQL is rarely used on its own. We use scripts that barbarically execute SQL queries, let’s say to do batch inserts. Wouldn’t it be great if we could leverage all the Postgres features to do this work?

A concrete example

Let’s imagine you are migrating a users collection from a Firestore
to a PostgreSQL database. It contains all the information of your users, but here I only represented the emails attribute.

You would need to do two things:

  • create the structure to host the data
  • migrate the data over

Let’s assume our Firestore looks like this:

A couple of records from our theoretical Firestore database.

The key ( c4ca42...) is the ID of the document and {"email": "foo@bar.com"} is the document in the Firestore.

Create the structure

The first thing you would do would be to create the schema to record your data. Sometimes, it can be hard, but here our schema is straightforward:

Create a user table.

Now, you want to insert test records to try out how to work with the rest of your data:

Insert a couple of user records.

But the war is not over.
Your data still lives in the Firestore and you need to migrate them over.

Inserting all the records

Let’s say you managed to export your data in the following CSV:

The first column is the key of the document in the Firestore, the second is the full document in JSON.

You have at least two ways to insert them in your new table:

  • record per record
  • in batch

Record per record

Using a script in the language of your choice, you would have something like this:

I did not try this, but you get the idea.

Note: PostgresSQL can hold and process JSON.

This works if you have little data, but if you have a lot of records, it will take forever.

Also, it involves another programming language that you have to setup, has little interactivity, and does not leverage the full power of SQL.
What if you notice half way through that something is wrong?

By batch

Behold the beauty of Postgres and UNIX composability.

We will assume that the file is available at ~/Desktop/users.csv .

Open a database connection to a development database and start playing!

Note: This is meant to be run interactively. Open a psql session or a GUI and run those commands one by one.

Now, you should be pretty confident that this worked, so you are ready to run it on production. You can keep the savepoints if you want to run the migration interactively, however, I am going to show how to run it with the CLI.

Most likely, your are using a managed database on production, and you cannot «just» put a CSV file there. Thankfully, Postgres follows the UNIX philosophy and it can read from standard input!

Note: You should pass your connection parameter to the psql command.

What did we learn?

  • PostgreSQL is great, especially to experiment with data in a safe way (transactions, savepoints).
  • We can leverage its full power to batch insert records in a expressive and efficient way.
  • UNIX composability wins again.

--

--