How to optimise the database performance and scale it? Things to be done before shading the database.

Purnendu Kar
Nerd For Tech
Published in
5 min readNov 5, 2021

There are several ways to optimise the database to improve performance and make it more scalable. One of the most popular concepts you might have heard of is database sharding, which basically scale-out the database. But database shading is the last option you might be looking for to optimize the database. So let’s see what are other options we have to optimize the database.

Photo by Jan Antonin Kolar on Unsplash

Optimize Query Statement

Photo by Caspar Camille Rubin on Unsplash

This is the first step that you could take to optimize the database. If we write a query that uses more processing power of the database then it will impact the overall performance. Now the question comes how to optimize the query?

  • Select only those fields that are required: Try to avoid * in select queries. If the query is for an unwanted field then it will allocate more memory in RAM to process, which impact overall performance.
  • Limit the data output: Avoid writing a query to fetch all data at once. Suppose you have a table with a huge amount and try to fetch all data at once then the CPU cores will block by a single process for a long time. You can use limit, offset to fetch a limited amount of data. Basically, we are paginating the data.
  • Avoid unnecessary join operation: Join operation combines two table data based on a particular condition. So make sure database architecture is designed in such a way minimum join operation need to be performed and join operation is performed when it’s necessary.

Indexing Columns

Photo by Maksym Kaharlytskyi on Unsplash

Index in DBMS is a data structure that is used to locate and access the data in a database table quickly.

Just imagine you have a huge amount of data and want to search a data then you need to search the whole heap of data of it. What if you could arrange the data into categories, based on the category you know on which category you need to search the data. You go to that category and search on that limited amount of data.

This is how indexing work, the column that we used for searching can be indexed, which will be considered as category arrangement in the above example. Indexing is used to optimize the performance of a database by minimizing the number of disk accesses required when a query is processed.

But there is not always optimize the performance, it also has its limitation. If the data is impartially distributed or a huge amount of data is allocated for a particular value, then there won’t be much change in performance.

Master-Slave Architecture

Photo by KOBU Agency on Unsplash

If all the approach is applied in the database then we can implement the master-slave architecture. So what is master-slave architecture?

Master-Slave Architecture is an approach in which we create replicas of databases. On will be master and other replicas are the slave of the master. Master handles the write operation and slaves handle the read operation.

Basically, we have just distributed the workload based on tasks. For a write operation, we will refer to the master node and for a read operation, we will refer to slave nodes. Slaves follow their master and keep in sync with it if any changes are made in the master.

Usually, there is more read operation performed compared to write operation, so we can create multiple slaves to improve the performance by distributing the task.

Multiple Masters

Photo by Amy Hirschi on Unsplash

This approach is an extension of master-slave architecture. Suppose now your write operation workload has increased very much, in that case, we can create multiple master nodes. These master nodes may be created based on some categories like geo-specific. And both master nodes will be synced with each other.

Let’s take Instagram as an example, many people upload their content all over the world. So to handle that load there could be a master node near their geographical location where the content will be updated first. Then that will get synced with other master nodes.

In this way, there will be low latency to update content or to access it from anywhere in the world.

Archive Data / Data Partition

Photo by Giammarco on Unsplash

Once all the above approaches is been applied and the data size of the table has become too large and takes time to query. Then we can go for the following approach:

  • Archive the old data that is highly likely not required so frequently (Example notification data).
  • Partition of data, partitioning the data based on categories. It’s might sound similar to database sharding but it’s not. Sharding is about dividing the data into multiple databases based on some condition. And partitioning is about dividing the data into multiple tables based on some condition.

Conclusion

Database Sharding is a long way to go. It’s highly likely that we might never need to implement database sharding. Database sharding is the last option to pick up to improve database performance. It’s rare that you might need to shard the database. You can read more about database sharding in my database sharding blog.

--

--