Making use of indexes in Postgres

One step forwards towards becoming a database ninja. Learn how to use indexes in postgresql

Neelesh Dwivedi
Webtips
4 min readAug 10, 2020

--

Making use of indexes in Postgres
Photo by Nam Anh on Unsplash

This is the first article of the series of many articles that I am going to publish on this topic.

The importance of this topic can not be stressed enough but bad indexing is the main source of bad database performance. Of course, it is important to adjust memory and vacuum parameters. However, it is all in vain if indexes are not used properly. There is simply no replacement for a missing index.

In order to achieve good performance we need proper indexing and to get started with indexes we need to understand how things work. For all this thing we will be needing some test data.

We will create a table with 5 million rows using the generate series function as follows:

test_indexing=# create table test_indexing(id serial, name text);

Here we create a table test_indexing with an id which is of type serial and the name text column. We will insert into this test_indexing 2.5 million rows with the name Bob and then we will do the same with Alice i.e, a total of 5 million rows.

test_indexing=# insert into test_indexing(name) select 'bob' from generate_series(1,250000);test_indexing=# insert into test_indexing(alice) select 'alice' from generate_series(1,250000);

Now for checking of performance we will switch on the analyze and timing operators.

test_indexing=# analyze;
test_indexing=# \timing

Now what we will select a record by searching on the id field and let’s check how much time it takes.

test_indexing=# select * from test_indexing where id = 2;
id | name
----+------
2 | bob
(1 row)
Time: 12.183 ms

Here we see that the query is taking almost 12 milliseconds. We will execute this query two-three times to make sure there are no big differences caused by the cache. Let's see why we got such bad performance to find out what might be the cause. We will use the explain command when a query is not performing explain command will help you understand the real performance problem.

Here is how it works:

test_indexing=# explain analyze select * from test_indexing where id = 2;

Explain analyze will also make sure that the query is actually executed. The output of the above command will be:

Seq Scan on test_indexing  (cost=0.00..871.00 rows=1 width=9) (actual time=0.027..11.798 rows=1 loops=1)
Filter: (id = 2)
Rows Removed by Filter: 49999
Planning time: 0.084 ms
Execution time: 11.844 ms
(5 rows)

The problem here is that Postgres is using a sequential scan to give us the result that we want. Scanning the entire table to just find a single row is usually not a good idea. Therefore it makes sense to create indexes.

test_indexing=# create index idx_id on test_indexing(id);

Let us execute the explain analyze query one more time, this time on our indexed table.

test_indexing=# explain analyze select * from test_indexing where id = 2;Index Scan using idx_id on test_indexing  (cost=0.29..8.31 rows=1 width=9) (actual time=0.027..0.029 rows=1 loops=1)
Index Cond: (id = 2)
Planning time: 0.123 ms
Execution time: 0.065 ms
(4 rows)

Here we can see that Postgres is now using an Index scan rather than a sequential scan which is much much faster. Let’s also see the time query is taking now.

test_indexing=# select * from test_indexing where id=2;
id | name
----+------
2 | bob
(1 row)
Time: 0.502 ms

Here we can see that the database performance has improved drastically. These indexes also allow concurrent operations. We can read and write on the same index at the same time which will help to improve the throughput dramatically.

However, indexes are not free. Let’s check the size of the table vs the size of the index.

test_indexing=# \dt+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+---------------+-------+----------+---------+-------------
public | test_indexing | table | postgres | 2000 kB |
(1 row)
test_indexing=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+--------+-------+----------+---------------+---------+-------------
public | idx_id | index | postgres | test_indexing | 1112 kB |
(1 row)

Here \dt+ command is used to fetch the size of the table and \di+ command is used to fetch the size of the index. As we can see that index takes quite a lot of space. Furthermore writing to the table will be slower because the index has to be in sync all the time. In this case, the index is certainly not tiny compared to the table. Here the size of the index is more than half the size of the table. However, this is the case surprisingly often.

We should always take note of how large the index is relative to the table when determining if it’s a worthwhile addition. The disk and maintenance overhead of an index can be considerable and it needs to be justified by performance improvement.

In addition, if you insert into a table having many indexes you also have to keep in mind that you have to write to all of these indexes on the insert which seriously slows down the writing.

Conclusion

Here we learned about indexes and how they help in our query optimization. We also learned a little bit about explain analyze command and what it does. We also learned that the indexes are expensive so we need to be careful while using them.

It was first of the many articles that we are going to cover so that you will become a Postgres champ in no time. Please give a clap if you liked the article.

--

--

Neelesh Dwivedi
Webtips
Writer for

A burning sense of passion is the most potent fuel for your dreams.