The Fastest You Get, The Bigger You Save with Indexing

A short guide to best practices around indexing tables in databases

Antonius Christiyanto
Ralali Tech Stories
5 min readAug 26, 2021

--

Maybe you still remember using a dictionary of languages to look up the specific things you want to find. By tracing the marks that have been provided in the corner of the page in the dictionary, we are quickly directed to the page where contents begin with the letter or word that corresponds to the mark. Take a look at this, a simple explanation of the index works.

The bigger the data, the bigger the query execution time.

Imagine a database able to trace every row of data on each table and find the things we are looking for. Imagine if each table has hundreds or even millions of rows of data, it will make the searching process longer. The data stored in the database may not sequence to search one or more data would be ineffective because it takes time and a larger resource for every query execution run.

The response time of each query execution will be directly proportional to the amount of data. When you have the larger data, then its response time will also increase. But we don’t want this to happen every time we execute a query. The index is one way to handle every query execution to be faster and more efficient.

Database Index

An index is a data structure containing a collection of keys and references to the actual data in the table in a database. When you perform a query, the index makes the locating search process faster by generating pointers to where the data is stored in the database. Suppose you want to find information in a database with such large database content. In that case, the index helps you to get this information out quickly. You don’t have to search or go through every line to find the information you want.

When to use and not use Indexes

Indexes are highly recommended if you want faster performance on your database. Optimal use of the index will make it easier for you when you have to display data with columns that contain many NULLABLE values ​​and in columns that are often used in WHERE or JOIN queries. Executing the query that you do will also not take a long time when already using the index. The larger the data in your database, the more likely you are to use indexing on your database.

However, if you have an updated database all the time, it will reduce this index. When the data is stored in the database, there is a process to complete the writing of the data and can use the index on the table. If the database continues to write data, the index will be complicated to use. Preferably, the index is applied to a database warehouse that continuously gets the latest data so that the use of the index will be more optimal.

Performance Index

I will give some examples of the performance of a query when run on a table that has an index or does not at all have an index.

I created four tables with the same schema and data; with data size, each table has 4,121,938 records but has a different index.

Indexes

Table users - Indexes: id (Primary Key)

Table users

Table users_index_email - Indexes: id (Primary Key), email

Table users_index_email

Table users_index_email_deleted_at - Indexes: id (Primary Key), email, deleted_at

Table users_index_email_deleted_at

Table users_index_name -Indexes: id (Primary Key), name

Table users_index_name

Experiment

Experiment #1

Experiment #1

In this experiment, I executed the same query on all tables with the where statement in the name column. Based on the first experiment results, the fastest time queries in the tables users_index_name have indexes on the column name. In the users table, there have indexes altogether. In contrast, other tables have indexes on different columns, so it does not affect query time on the column name.

Experiment #2

Experiment #2

In the second experiment, I executed the same query on all tables with the where statement in the email column. The result shows that the fastest query time is in the users_index_email and users_index_email_deleted_at tables, which both have an index in the email column. Even though the users_index_email_deleted_at table has a combined index on the email column and deleted_at, one of the indexes in that column will affect the query time.

Experiment #3

Experiment #3

In the last experiment, I execute the same query on all tables only by statements in the column where email and deleted_at. The result shows that the fastest query time is in the users_index_email_deleted_at table, an index in the email column, and the deleted_at column. The combined index on two columns significantly affects the query time. It will get results faster than having only one index on a specific column.

Conclusion

Indexes help you optimize the query process on your database, especially if you have a vast database. The unit index or composite index for each column in your table will affect the response time for each of your query executions. That way, use the index on your database correctly and adequately to make it easier for you to get more efficient results.

--

--

Antonius Christiyanto
Ralali Tech Stories

Software Engineer at Ralali.com. Turned all the syntax into something awesome :)