Analyzing Postgres Queries with Database Indexing

A practical example for understanding how indexes make databases read faster

4 min readMar 26, 2023

Introduction
What are database indexes?
Setup for testing read operations
Analyzing Queries
Observations
Limitations
Conclusion
References

1. Introduction

Recently, I started exploring ways to improve the performance of databases. Theoretically, a lot of sources mentioned that just adding a simple index can lead to an exponential decrease in the execution time for the read operations. Intrigued by this fascinating concept, I decided to experiment and analyze Postgres read queries with and without indexes to understand the concept with a practical approach.

2. What are database indexes?

According to Wikipedia, “A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure.” So basically, Database Indexes are smaller referential tables that hold row references against the indexed value. This helps to minimize the number of disk reads required to process a query which speeds up the read operations.

For an in-depth understanding of the concept, you can check out the links specified in the Reference section of this blog.

3. Setup for testing read operations

The complete code can be found in this Notebook

3.1. First, we will set up a Postgres connection with Python.

# Import statements for set up 
import psycopg2
import random
import string
from tqdm import tqdm

# Database Information
DATABASE_NAME = "users_test1"
DATABASE_USERNAME = "postgres_user"
DATABASE_PASSWORD = "password"
DATABASE_HOST = "localhost"
DATABASE_POST = "5432"

# NOTE: Please ensure DATABASE_NAME (users_test1 in this case) 
# database exists in the Postgres server

# Establishing the connection
conn = psycopg2.connect(
    database=DATABASE_NAME,
    user=DATABASE_USERNAME,
    password=DATABASE_PASSWORD,
    host=DATABASE_HOST,
    port=DATABASE_POST,
)

# Creating a cursor object using the cursor() method
cursor = conn.cursor()

3.2. Let’s create a sample table that can be used for querying.

# Droping EMPLOYEE table if already exists.
cursor.execute("DROP TABLE IF EXISTS EMPLOYEE")

# Creating table as per requirement
sql ='''CREATE TABLE EMPLOYEE(
   FIRST_NAME CHAR(20) NOT NULL,
   LAST_NAME CHAR(20),
   AGE INT,
   SEX CHAR(1),
   INCOME FLOAT
)'''

cursor.execute(sql)
conn.commit()

print("Table created successfully")

3.3. Now, let’s insert 500000 records in this table. A higher number of records is advisable for better-analyzing queries with and without indexes.

# Preparing SQL queries to INSERT a record into the database.
for i in tqdm(range(0, 500000)):
    sql = f"""
        INSERT INTO EMPLOYEE(
            FIRST_NAME, 
            LAST_NAME, 
            AGE, 
            SEX,
            INCOME
        ) VALUES (
            '{''.join(random.choices(string.ascii_uppercase + string.ascii_lowercase, k=5))}', 
            '{''.join(random.choices(string.ascii_uppercase + string.ascii_lowercase, k=5))}', 
            {random.randint(22, 58)}, 
            '{random.choice(['F','M'])}', 
            {random.randint(100000, 1000000)}
        )
    """
    cursor.execute(sql)

# Commit your changes in the database
conn.commit()

print("Records inserted")

We have set up a table for analyzing the read operations now.

4. Analyzing Queries

4.1. Now, let’s try to fetch employees which are of age 34 without an index.

I used pgAdmin to better understand the query plan. However, this query can still be processed using Python but the output format is difficult for analyzing the query plan.

# Querying without index
sql = """
    EXPLAIN ANALYZE SELECT * FROM EMPLOYEE WHERE age = 34;
"""
cursor.execute(sql)
cursor.fetchall()

The execution time for this query is 27.015 ms.

4.2. Now, let’s try to add an index on the age column and check if the execution time has reduced.

# Create an index on age column
sql = """
    CREATE INDEX IDX_EMPLOYEE_AGE 
    ON EMPLOYEE(age);
"""
cursor.execute(sql)
conn.commit()

4.3. Now, let’s try to fetch employees which are of age 34 with an index.

# Querying with index
sql = """
    EXPLAIN ANALYZE SELECT * FROM EMPLOYEE WHERE age = 34;
"""
cursor.execute(sql)
cursor.fetchall()

In the query plan, it is mentioned that the query was executed via a Bitmap Index Scan on the index we created earlier.

And with this optimized query plan with the index, the execution time for this query is 5.301 ms.

5. Observations

The execution time for the read operation without an index on the age column is 27.015 ms. After adding an index, the execution time is reduced to 5.301 ms.

The execution time decreased by 5 times with a simple index!

6. Limitations

Even if the execution time has decreased significantly, there are certain limitations to using Database indexes in production which are listed below.

Additional disk space is utilized for storing the indexes are referential tables.
Database indexes can lead to slower Insert/Update/Delete operations.

7. Conclusion

Database Indexes are a powerful tool that can be used to speed up read queries with filters. As observed from the above queries, a carefully chosen index will boost the read operations which will significantly improve the application’s performance. However, we need to be careful while choosing the indexes as the improperly chosen index can lead to degradation of the overall performance.

8. References

👏 Your 3 claps mean the world to me! If you found value in this article, a simple tap on those clapping hands would make my day.

🚀 Please consider following for more tech-related content.

🌟 If you found this blog helpful and would like to stay updated with more content, feel free to connect with me on LinkedIn.