How this low-hanging fruit improved our DB throughput by over 200x and prevented outages in production

Published in

saas-infra

7 min readJan 7, 2023

Intro to Database Performance Analysis for Software Developers

An elephant never forgets (to properly use table indexes)

Team Lead: Lior, your next task is to improve the stability of our database and fix any performance issues.
Me: No problem, I’ll see what I can do.
Also me (5 minutes later, realizing I have very little hands-on DB experience): !?$#$?#?%%#@!#@&^%$##.

My team is responsible for maintaining the infrastructure for a software as a service (SaaS) product that includes a web portal with a Node.js based backend and multiple PostgreSQL databases. We were experiencing frequent outages and unacceptable levels of latency, so I was tasked with finding ways to improve the performance of our databases.

Database performance is a complex topic and in this article I will focus on one specific technique: using table indexes to optimize queries.

Improving database performance is a valuable goal in and of itself, but it’s ultimately just a means to an even greater end: improving the performance and responsiveness of our applications. When done correctly, this can provide a significant performance boost with minimal effort.

A couple of disclaimers:

Although the techniques and ideas discussed are specific to PostgreSQL, they can often be applied to other types of databases, both SQL and NoSQL. A quick search online should provide guidance on adapting these techniques for other databases.
This article was written for software developers who have little or no prior hands-on databases experience, just like I was a few months ago :-)

Some databases background

This section is a brief and simplified introduction to how databases store and retrieve data, provided for the purpose of helping you understand the rest of this article.

Sequential vs. Index scans

Relational databases store data in tables and when we perform a query, the database searches these tables to retrieve the requested data.

One method for searching a database is to read the entire table into memory and scan it from start to end, called a sequential scan (or full table scan).
However, this can be very inefficient in many cases.

A more efficient approach is to use indexes on certain columns in the table. When querying the database, it can search the index only, which is optimized for quick searches and is much smaller in size compared to the entire table. When the requested values are found, the database then accesses the disk to fetch the specific parts of the table it needs, rather than reading the entire table. This is called an index scan.

A couple of important notes about indexes:

Postgres automatically creates an index on a table’s primary key, but this may not be enough for optimal performance. Depending on the query that is executed, indexes on other columns may also be necessary, so you may need to add them manually.
Indexes can be created on one or more columns, depending on the queries you will be running. For example, if a query includes a condition such as WHERE tenantId = 'some_tenant_id', you may find that an index on the tenantId column is sufficient.
However, if the query has a more complex condition such as:
WHERE tenantId = 'some_tenant_id' AND status = 'some_status',
you may benefit more from a multi-column index on both tenantId and status.

Database Views

Database views allow you to give a name to a predefined query that you can then use like a regular table. Postgres has many built-in views, but the one I find most useful for analyzing queries and optimizing them is pg_stat_statements.

The pg_stat_statements view returns statistics for all SQL queries that have run on the database since the last reset of statistics.

Note that the pg_stat_statements view is not enabled by default in Postgres. You may need to have a chat with your DevOps/DB admin team to enable it.

Query Analysis

Postgres tries to optimize queries to minimize their “cost” and creates a plan for how to execute the query. This planning step is usually quick and occurs before the query is run.

You can use the EXPLAIN and EXPLAIN ANALYZE commands to view the query plan and see if it makes sense (i.e. by running EXPLAIN ANALYZE <query_statement>).

Note that EXPLAIN is a read-only DB operation but EXPLAIN ANALYZE isn’t! Be very careful not to use EXPLAIN ANALYZE on an UPDATE or DELETE statement, as it will actually execute the query and modify the database.

Applying the Theory: Improving Database Performance

Now that we have learned about sequential scans versus index scans, database views, and how to analyze queries, we can use this knowledge to improve the performance of our database.

Identifying Poorly Performing Queries

We can use the pg_stat_statements view in a query to see a list of the top 20 queries that have consumed the most compute time on the database:

The following screenshot shows the top four most time-consuming queries that we identified in our databases:

The most time-consuming queries in our DB

It is worth noting that these queries were executed frequently compared to other queries in the list, about 2.3 million times each. The execution time for each query was also relatively high at 16–28 milliseconds, which is significant for a SELECT query as the next section shows.

Analyze the Poorly Performing Queries

Now that we have identified the queries that take the longest to exectue, let’s see if we can improve their performance.

We ran EXPLAIN ANALYZE on each of the previously mentioned queries to view their execution plans. Here is an example of the results for one of the queries:

Query plan for a poorly performing query

This query took 28 milliseconds to execute because the Notes and Actions tables were being sequentially scanned, indicating that they were lacking table indexes.

To fix this issue, we added an index on the noteId column in the Actions table (noteId is a foreign key in the Actions table) and a multi-column index on the tenantId and status columns in the Notes table.

Here is the query plan for the same query after adding these indexes:

The execution time for this query dropped from 28 to 0.07 milliseconds! Hooray!

The plan also shows that the new indexes are being used instead of sequential scans of the entire tables and that the query cost has significantly decreased.

Bonus Tip

Upon closer analysis of the four previously mentioned queries, I noticed that they were essentially the same, with the only difference being the order of the conditions in their WHERE clause.
Two of the queries had the following WHERE clause:
WHERE "tenantId" = 'some_tenant_id' AND "status" != 'dismissed'.
The other two had the conditions in the opposite order:
WHERE "status" != 'dismissed' AND "tenantId" = 'some_tenant_id'.

Postgres may choose different execution plans depending on the order of the conditions in the WHERE clause, so it treated these as distinct queries. As mentioned earlier, I added a multi-column index on the (tenantId, status) columns in the Notes table, but this only improved the performance of the first two queries. The other two queries, which had the opposite order of conditions in the WHERE clause, were unable to benefit from this index.

One possible solution would be to create another multi-column index on (status, tenantId), but this would incur additional storage costs. In addition, an index takes time to update when items are added or removed from the table, and we are already incurring this cost when updating the (tenantId, status) index.

Therefore, it is more efficient to find another solution, such as modifying the query or, in our case, changing the code that generates the query since we are using an ORM library (Object-Relational Mapping).

By rearranging the order of the WHERE clause conditions in our Node.js service code, all four queries became identical in the eyes of Postgres and were able to utilize the newly created index. The following simplified code snippet shows the changes we made in the module that interfaces with the ORM:

Overall Performance Improvement

After implementing the changes mentioned earlier, the execution time of all four queries was reduced from 78 milliseconds to 0.29 milliseconds on average, resulting in a 270-fold improvement in performance!

Since each of these queries run 4 times per second, the database now spends only 1.16 milliseconds executing them per second, instead of 312 milliseconds.

Summary

Optimizing databases performance is a complex process with many different considerations and in this article we have only touched on the topic of indexing best practices.

However, even by performing a few simple steps such as analyzing queries and adding missing indexes, it is possible to significantly improve the throughput of a database and its associated applications.

If you have any performance analysis and improvement tips of your own, feel free to share them in the comments section.

Good luck!