PostgreSQL

How to determine unused index in PostgreSQL?

And safely remove it without breaking your production system :)

Anas Anjaria
4 min readOct 3, 2023
How to determine unused index in PostgreSQL?
Photo by Wolfgang Hasselmann on Unsplash

Recently, our whole team was working on speeding up write operation in PostgreSQL. By write operations, I don’t mean INSERTs only but UPDATEs and DELETEs as well.

Everybody was trying different strategies to speed up things.

As we know that indexes help us in retriveing data from our database faster, but it comes with a cost — They slow down write operations. It is the old choice.

speeding up reads can slow down writes and vice versa.

So, I was curious whether there is a way to reduce the number of indexes in our production system. This might add some improvement to write operations.

While Googling, I came across this informative article (See Postgres Index stats and Query Optimization). It explains in a very systematic way to determine unused indexes.

The best part is that we can confidently and safely remove unused indexes without risking any disruption to our production system.

I am sharing the approach I used during that time in this post.

Why unusued indexes are problem?

Indexes come with associated costs that we must consider:

  1. They introduce performance overhead for write-heavy applications.
  2. They consume disk space, especially in the case of large tables.

Hence, the most undesirable type of index is the one that remains unused.

Another reason is minimalism. In software engineering, principle of minimalism works the best. If it’s not used, what’s the point of keeping them.

How to detect unused index in Postgres?

PostgreSQL has a view named pg_stat_user_indexescontaining index statistics.

One could retrieve index statistics of a table named foo using the following query.

SELECT relname , indexrelname , idx_scan , idx_tup_read , idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public' and relname = 'foo';

where

  1. relname: The name of the database table.
  2. indexrelname: The name of the user index for which these statistics are relevant.
  3. idx_scan, idx_tup_read & idx_tup_fetch: are different index statistics metrics helps us in determining whether index is being used or not.
If you're interesting in understanding these indivisual index metrics, 
I would highly recommend reading section "Some background info" of this
blog post [1].

[1] https://sgerogia.github.io/Postgres-Index-And-Queries/

An index is considered unused within our system if and only if:

  1. The metrics typically hold a value of 0 (although this is quite rare).
  2. More notably, these metrics maintain a constant value over time, possibly spanning a day or a week, contingent on your specific business use case. When all relevant use cases have been executed during this timeframe without any change in these metrics, it indicates that the index is indeed unused and can be safely removed from the production system. This observation aligns with our findings in the production environment as well.

Now obvious question would be how to analyze such a big time series data effortlessly 😃.

Answer is by collecting metrics and visualizing information via kibana.

Strategy to detect unused index in Postgres effortlessly

Step 1 — Collect index statistics as metrics

You can periodically collect index statistics and publish it to elasticsearch.

SELECT relname , indexrelname , idx_scan , idx_tup_read , idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public' and
relname in ('YOUR-TABLE-NAME');

If you don’t know how to do it, don’t worry. I have explained it my article here.

Keep collecting metrics for a time period until you're confident that
all the business usecases have been executed.

Step 2 — Analyzing statistics using kibana

If the index is unsued, it will remain at a constant value over time. And the derivative of a constant value is 0.

You can visualize it using kibana and create a graph as shown below.

How to determine unused index in PostgreSQL?
Unused index over time

How to create visualization in kibana?

Conclusion

In conclusion, identifying and removing unused indexes in PostgreSQL is easy.

By leveraging index statistics and monitoring their behavior over time, you can make informed decisions that lead to more efficient and optimized database systems.

Happy indexing.

Enjoyed this guide on optimizing PostgreSQL and want to stay updated with more technical insights and tips?

Don’t miss out on future articles and in-depth tutorials!

Subscribe now and be a part of my growing community.

https://medium.com/@anasanjaria/subscribe

--

--