Using indexes in an intelligent way!

Are indexes always good to use?

Published in

Webtips

3 min readAug 10, 2020

Using indexes in an intelligent way! — Photo by Max Duzij on Unsplash

This is the second article in the series of explaining indexing in PostgreSQL. If you missed the first article you can visit this link.

Making use of indexes in Postgres

One step forwards towards becoming a database ninja. Learn how to use indexes in postgresql

medium.com

Cardinality

Cardinality is an indicator that refers to the uniqueness of all values in a column.

Low cardinality means a lot of duplicated values in the column. For example, a column that stores gender values have low cardinality. In contrast, high cardinality means that there are many unique values in that column. For example the primary key column.

When we make a query on the low cardinality column it is likely that we will receive a large portion of the table. It will be more efficient to use a sequential scan rather than an index scan because we will be fetching a lot of columns anyway.

Until now in our database, we have stored only two values in the name field Alice and bob. Now we want to make a query on the name field but before that, we will need to index the name column.

test_indexing=# create index idx_name on test_indexing(name);
CREATE INDEX

Now we run explain analyze command on the select query to check what strategy Postgres is using behind the scenes.

test_indexing=# explain analyze select * from test_indexing where name = 'alice';Seq Scan on test_indexing  (cost=0.00..871.00 rows=25202 width=9) (actual time=6.398..15.959 rows=25000 loops=1)
   Filter: (name = 'alice'::text)
   Rows Removed by Filter: 25000
 Planning time: 4.347 ms
 Execution time: 18.752 ms
(5 rows)

As we can see here that Postgres is using a sequential scan here although the column is indexed. Here Postgres analyzed that it has to read most of the table anyway so it is better to just scan the table instead of scanning both the index and the table. This brings up the important question: How to choose what columns should be indexed?

Postgres will not use index because there is one. It will only use indexes when it makes sense.

Selectivity

Selectivity is the measure of the percentage of rows that would be retrieved by a query.

It all boils down to selectivity. Let us understand this with the help of an example. Only the names ‘Alice’ and ‘Bob’ exist in the table, so what will happen if we try to fetch ‘alice1’ which does not exist in the table.

test_indexing=# explain analyze select * from test_indexing where name = 'alice1';Index Scan using idx_name on test_indexing  (cost=0.29..7.31 rows=1 width=9) (actual time=0.032..0.033 rows=0 loops=1)
   Index Cond: (name = 'alice1'::text)
 Planning time: 0.132 ms
 Execution time: 0.068 ms
(4 rows)

Since the number of rows returned is less i.e., the selectivity of this query is less, Postgres decided to use the index scan here. Please observe that alice1 does not exist in the table and the query plan perfectly anticipated that. There is no row in the table which has the name alice1 but Postgres will never estimate 0 rows because it would make subsequent estimation a lot harder.

This behaviour that index scans will flip to sequential scans once they access enough of the table is surprising to many people. This whole area is one of the things to be worried about when executing your own queries that expect to use indexes.

Conclusion

In conclusion, execution plan depends upon the data inside the table. This is a very important insight which should be remembered all the time. If the underlying data distribution changes dramatically, execution plans will change as well and this is the reason for unpredictable run times.

This was the second of the many articles to come on the topic of indexing. Please keep reading and give a clap if you find this article helpful.