PostgresSQL Query Optimisation with Efficient use of UPSERT Queries

At Tokopedia, we follow the microservice architecture and almost every service has its own database. Data on each service is increasing exponentially over time. Therefore, having the right database, schema and proper use of queries is paramount to match Tokopedia’s scale.

This article shares insights from some of the improvements that our engineering team has done while working on the PostgresSQL database in Tokopedia. Let’s unleash some PostgresSQL Qi!!

Before we begin, I want to describe a bit about our data set, we worked on !!

At Tokopedia, we have data related to our ads which is called statistics data. Businesses or individuals trying to market products or services often create ads to drive their promotion efforts. The ads’ performance data consists of 3 common terms-

Impressions- Whenever an ad gets visible to the user we call it an impression. Impressions of an ad are the number of times that ad was actually shown to the end-user.
Clicks- The number of times a user clicks on an ad.
Conversions- A Conversion is a click that is converted into action.

We have some APIs that publish statistics data in real-time and later, this data is dumped in our PostgreSQL database asynchronously by consumers. Since we have an enormous amount of ads and large data to process in real-time, we run multiple threads/processes to store this data. We trigger other processes later, to disable those ads whose daily credit/balance/budget get exhausted.

Here, I will discuss about optimization done on the consumer part which is responsible for dumping statistics data in our PostgreSQL database.

Problem Statement

The time taken by our system to disable the ads was taking more than expected. As a result, there was a delay in disabling the marketing ads, which was leading to revenue loss.

What went wrong??
Use of select then insert/update query (depending upon whether any row retrieved in the select query) inside an explicit transaction.

Code Snippet for consumer process-

// begin transaction
tx, err := dbMaster.Begin()
//set isolation level
_, err = tx.Exec(`set transaction isolation level repeatable read`)
//select query to find data for an ad
var bool recordExists
query1 := "select click, impression, conversion from stats_table
where ad_id = 123"
//execute query1
err = tx.QueryRow(query1).Scan(&oldClicks, &oldImpressions, &Conversions)
if err == sql.ErrNoRows {
recordExists = false
} else {
recordExists = true
if recordExists {//use update query
query2 = "update stats_table set
click = oldClicks + newClicks,
impression = oldImpressions + newImpressions,
conversion = oldConversions + newConversions,
where ad_id = 123"
} else {
// use insert query
query2 = "insert into stats_table (click, impression, conversion, ad_id) values (newClicks, newImpressions, newConversions, 123)"
}//execute query2
_, err := tx.Exec(query2)
if err != nil {
// if error then rollback transaction
} else {
//if no error commit transaction

The purpose of the above implementation

  • As statistics data dump is handled by many threads/processes to boost data dumping rates, the entire query was wrapped in an explicit transaction block to make writes atomic and non-conflicting with other threads/processes.
  • The select query was used in order to deduce, whether we need to update a row in our stats table or create a new one (update/insert a row) for a particular ad.

Problems with above implementation

  1. As statistics data volume is very high, there are chances that two processes will try to update statistic data for the same ad simultaneously; hence one entry will be aborted, as row lock will be acquired by another process. Thus, leading to data loss.
  2. Number of queries fired on DB for single data insertion are 5(BEGIN transaction, set transaction isolation level, SELECT query to fetch existing data and identify whether we need to fire insert or update query, INSERT/UPDATE query, ROLLBACK/COMMIT transaction). Thus, resulting in higher latency.

UPSERT to the rescue.

Instead of select and then insert/update the use of UPSERT query.

Code Snippet for new implementation-

//new upsert query
query = "insert into stats_table (click, impression, conversion, ad_id)
values (newClicks, newImpressions, newConversions, 123)
on conflict
update set
click = + newClicks,
impression = stats_table.impression + newImpressions,
conversion = stats_table.conversion + newConversions"
//execute upsert query
_, err = dbMaster.Exec(query)

Please note that ad_id in stats_table was unique.

Benefits of the new approach

  1. Queries reduced from 5 to 1 therefore, low latency.
  2. No need for explicit transactions; use implicit transaction managed at PostgreSQL level which will guarantee, data is non-conflicting with other thread/process by maintaining exclusive write locks.
  3. As there will be no explicit transaction; hence no chances for conflicts. Thus, eliminating data loss possibilities.


  1. A decrease in DB Server CPU Usage (3%).
  2. PostgresSQL locks reduced. (75% reduction)
  3. Statistics Data Accuracy Improved by eliminating the use of explicit transactions. Conflicting transactions count reduced to 0.
  4. Improvement by 60% (cost time) to dump the data to the PostgreSQL table.
PostgreSQL Lock Reduction
DB Server CPU Usage Decreased

We have shared some of the common problems faced while optimizing queries in a database and how to solve them. if you have any suggestions or issues for optimizing a query please reach us by commenting below.

Happy SQLing. :)




Story from people who build Tokopedia

Recommended from Medium

Detecting and fixing crashes

Automate Excel with Python

“Built With Love”: Tech Lead Semjon Molokanov on Engineering at Klara

CS373 Fall 2021: Week 6

A Quick Overview of random Module: Python Complete Course — Part 19

A dummy image for better reading and navigation.

I am loving the software engineering program at Flatiron School.

Making Use of the Ballerina Config API

Android CodeView: an Easy way to create Code Editor app

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Pushkar Singh

Pushkar Singh

Software Engineer- Tokopedia

More from Medium

Building a Knowledge Base Service With Neo4j, Kafka, and the Outbox Pattern

NoSQL database

Evolving the State Design Pattern

IoT Analytics with Kafka for Real Estate and Smart Building