How to improve database performance?

Nishant Munjal
5 min readMay 30, 2022

--

Over the past few years in my software engineering, this is the single question that I have faced or heard the most from other developers.

Improving database performance can have multiple solutions ranging from improving the architecture of the database like read replicas, sharding or using warehouse, application-level changes like optimized SQL queries, connection pooling, and configuration of the database, plugins, etc. I will try to cover all these aspects in some detail below.

Application Level Optimisation

  • Indexing

This is the first and easiest thing you can do if your queries are getting slow and you are running your queries on the same set of where conditions most of the time.

When you create an index on a column (or multiple columns), the database internally creates a data structure to speed up a search with faster lookups.

But indexing does come with a downside, first, we need more storage/memory to save these indexes and the other is that insertion for new records will now become slower because the database will have to keep the indexes up to date with new data getting inserted.

  • Materialized Views

There are a lot of times when we want to show pre-computed results for a query. For all those use cases, we can create materialized views in our databases.

Materialized views are very useful when we can work with pre-computed results for queries with high ram or CPU requirements and don’t want to impact the DB performance when the results are required.

On the downside, there can be inconsistent results because materialized views wouldn’t be updated in real-time.

  • Connection pooling at the application level

Database connections are the more time taking process compared to executing a query once you have done a connection. So, it’s very expensive to connect to a database, run a query and then disconnect every time we have to do an operation on a database.

Also, we cannot work with a single connection because that would block other transactions in the system.

So, the ideal scenario is to use multiple connections and reuse them rather than create a new connection every time. This is called connection pooling

  • Application-level caching

Every business use case will have certain data points that will be read very often from the database. The idea with caching is to save all these frequent data points in RAM for faster reads. Also, this will free up resources for other database transactions.

There are some existing services which you can use like Redis. But, you have to make sure that you don’t store every datapoint because storage in Redis can be expensive.

On the downside, updation of data becomes more tedious because now you will have to update data in both caches as well as your database.

  • Message Queue

There are certain use cases in every product where you do not have to send the response immediately back to the user. For example, reporting a post as spam on Instagram or liking a post on Linkedin.

In all such use cases, we can keep our messages in a queue and revert the user with a response that their request has been accepted.

All these requests can be later processed at their own time and reduce concurrent transactions in the database. Queues will make it easier to handle peaks in traffic by acting as a buffer to store requests. We can use something like Redis or kafka to store all these requests in memory.

System-level Optimisations

  • Vertical Scaling

Vertical Scaling simply means adding more RAM, CPU, storage, or network bandwidth. This is the easiest and least time taking process if you or the client are in rush to improve the database performance.

The downside with vertical scaling is beyond a point, you cannot vertically scale your database. Also, as you vertically scale, the cost rises exponentially. The other downside with a large capacity database is that now you have a single point of failure which is also difficult to migrate.

  • Read Replicas

Most of the applications or business use cases that we see are read-heavy and don’t require consistency.

The idea with reading replicas is to create replications of our master database which are separate databases hosted on a different server. The main database is reserved for writes and the replications are used for reads.

The replicas can be synced asynchronously with the main database whenever there is a write operation.

The downside with read replicas is with the asynchronous flow of data, the data can get inconsistent at some points.

  • Connection Pooling at the proxy level

We learned about connection pooling above but in scenarios when we are horizontally scaling our application through docker container or lambda functions, we create a lot of connection pools which becomes infeasible for the database to handle

For example — When running the default Postgres docker image, the value we get for the variable max_connections is 100. This can be reconfigured but adding more connections requires more RAM.

The solution is to create a proxy between connection pools and the database. Proxies basically combine a lot of transactions into a single connection pool at their own level to reduce the number of connections.

  • Using a different database

Every database is built keeping specific use cases in mind. For example, SQL database is best suited for use cases that require ACID properties or requires relational joins. NoSQL databases are best suited if the data is unstructured.

Similarly, to optimize the read performance while still using SQL properties, we can use something like Redshift which stores data in columnar format and uses MPP (massively parallel processing) architecture to optimize the performance.

Apart from that, some databases store data in memory instead of disk. Accessing data from memory is much faster than accessing it from disk so data retrievals in these databases are significantly faster. Redis can be used for use cases like this.

There might still be more ways to optimize or scale your database but I have tried to cover most of what I have learned or read about in the past few years. Feel free to share any other new approaches or solutions in the comments.

--

--