Insert Optimisations in Golang

Amogh Agarwal
4 min readMay 30, 2017

Postgresql has been around for some 20 years now. It is a stable relational database (RDBMS) with many stable libraries available in every major programming language. We are using Golang in our company and came across a use case where we had to support bulk inserts.

The Problem Statement

We have a standard worker based architecture where a group of workers continuously poll a queue for jobs. They get jobs in batches of ten messages. You might have guessed which queue I am talking about :p. We then process these messages and insert these messages in a single table in the Postgres DB. The DB insertion is the interesting part here. So we set out to experiment what options we have for inserting multiple records in DB using any library in Golang. We had two concerns :

a) Insertions should be fast with minimum DB overhead, minimum DB calls. b) Since all workers run in separate go-routines, two workers can hit the DB at the same time and cause high locking wait or data inconsistency issues if not handled properly. We want to avoid this situation.

The traffic we are dealing with is around 10,000 jobs per minute in the queue, expected to double very soon. We came across the following approaches for going forward:

  1. Using a transaction and inserting the records one by one using the PREPARE statement using Gorm library
  2. Bulk/Batch Insert using any library that provides support.

Approach 1

The Gorm library in Golang does not support bulk inserts. So as a work-around we tried to use the exec command in a loop inside a transaction. Following is the snippet:

For three consecutive runs, we noted the execution times for the DB query. Note: Postgres 9.6.3 hosted on localhost on a 8 Gb Ram 128GB SSD Macbook Air with OSX 10.11

1) 218 ms   2) 181 ms   3) 149 ms

We see that the query times reduce after subsequent runs due to data caching by the database. This is expected. But the number of DB calls were 100 in this case. Not a good sign.

Approach 2

We used the pq library available for postgres. But it was a very interesting approach. Instead of using bulk inserts, the pq postgres library uses the COPY command of postgres. Following is the snippet.

The debug logs of postgres can be enabled to see the queries fired.

COPY “messagedetailrecord” (“accountid”, “subaccountid”) FROM STDIN

This shows that the library prepares the STDIN of inputs and passes it to the COPY function. The Postgres documentation describes COPY command as:

" \copy invokes COPY FROM STDIN or COPY TO STDOUT, and then fetches/stores the data in a file accessible to the psql client. Thus, file accessibility and access rights depend on the client rather than the server when \copy is used."

Another thing worth noticing is that we are using the PREPARE statements which makes the execution faster. Official Documentation says :

"A PREPARED statement is a server-side object that can be used to optimise performance. When the PREPARE statement is executed, the specified statement is parsed, analysed, and rewritten. When an EXECUTE command is subsequently issued, the prepared statement is planned and executed. This division of labor avoids repetitive parse analysis work, while allowing the execution plan to depend on the specific parameter values supplied."

The query was run 3 times in an environment similar to the previous one. The test times indicate a significant improvement. There is definitely an improvement of 2–3 times the first time and subsequent execution makes it even better with the caching and PREPARE statements optimisation.

1) 91 ms  2) 40 ms  3) 32 ms

Approach 3 — a possible approach

We wanted to test multi-line inserts using Golang and tried the library sqlx which provides a set of extensions on go’s standard database/sql library as Gorm library had no support for bulk inserts. However, we weren’t able to make it run or find a reliable source online. So, if you find a good implementation of bulk inserts using sqlx, please let me know in comments. However, we tested the execution time of a inserting 100 records at one shot using the INSERT statement. The results are more or less similar to the previous results with COPY statement.

1) 87 ms  2) 51 ms  3) 33 ms

It was a very good exercise helping us understand the fundamentals of databases and how to use optimisations for essential components. We ultimately went ahead with the approach 2 with COPY command as solved both our concerns. Currently it has been working seamlessly, however, I would love to hear if there are better options available that were left unexplored.

--

--