Index Redundancy Analysis On Steroids

Revolutionising the Process of Identifying Redundant Indexes in SQL Server Databases

Prakul Jain
engineering-udaan
Published in
11 min readMay 7, 2023

--

redundant index
index that becomes useless and unnecessary due to the presence of another index or set of indexes.

Have you ever heard of the saying “too much of a good thing can be bad”? Well, that applies to SQL Server indexes as well. I mean, sure, you can create as many non-clustered indexes as you want, but why would you? It’s like trying to impress your crush by wearing ten layers of cologne — it’s not gonna work and it just makes things worse.

The image depicts seven animated Spidermans pointing at each other, symbolizing the confusion of multiple redundant indexes in a database, each blaming the other for the poor performance.
Index overload: Finding the real culprit behind your sluggish SQL database

Double Trouble

The Perils of Duplicate Indexes in SQL Server

In SQL Server, it’s possible to create duplicate indexes on the same object. While everything from the index keys to the properties can be identical, this practice comes with several drawbacks.

  1. Duplicate indexes take up extra storage space. And who likes to waste precious storage space? Not us! By performing an Index Redundancy Analysis (fancy words for analysing for redundant indexes), we were able to identify redundant indexes and save almost 10% of storage space. That’s like finding a hidden 400GB treasure chest in your 4TB database!
  2. Redundant indexes can slow down DML statements (inserting, updating, deleting data). Imagine having to update multiple copies of the same index every time you make a change. Talk about a time-waster! But fear not, deleting redundant indexes can improve performance and make your database feel like the Incredible Hulk. Plus, you’ll save money on the CPU cores you can reduce.
  3. Rebuilding or reorganising indexes can be a real pain with duplicate indexes. By getting rid of redundant indexes, you can speed up this process and get back to more important things (like binge-watching your favourite show).
  4. Having multiple indexes on the same object can make it more difficult for the query optimiser to do its job. And nobody wants a cranky optimiser! By simplifying your indexes, you can make the optimiser’s life easier and improve query performance.

3 Shades of Redundancy

The Index Dilemma in SQL Server

Indexes can be considered redundant even if they are not identical. In an index, columns are divided into two categories: Index Columns and Include Columns.

Don’t Play Jenga with Your Indexes: Why Order Matters!

In SQL Server, The order of index columns is critical. SQL Server can use an index for a range scan only if the leftmost column is specified, and then only if the next leftmost column is specified, and so on. It’s like following a recipe — you can’t skip steps or add ingredients in the wrong order.

On the other hand, the order of non-key columns (include columns) in the index doesn’t matter at all. It’s like building a sandwich — you can put the lettuce before the tomato or vice versa, and it won’t affect the taste or texture of the sandwich. Including non-key columns in your index can significantly improve query performance because the query optimizer can locate all the column values within the index, resulting in fewer disk I/O operations.

Classification of redundant indexes

There are three types of redundant indexes:

  • Duplicate Indexes: Two indexes have the exact same key columns in the same order (i.e. identical indexes) with the same include columns. This is because the order of include columns does not matter.
    Eg:
    CREATE INDEX idx1 ON MyTable (ColumnA, ColumnB) INCLUDE (ColumnC, ColumnD);
    CREATE INDEX idx2 ON MyTable (ColumnA, ColumnB) INCLUDE (ColumnD, ColumnC);
    Both indexes have the same key columns “ColumnA” and “ColumnB” in the same order and include the same non-key columns “ColumnC” and “ColumnD”, making them duplicate indexes.
  • Overlapping Indexes: One index has key columns that form a left ordered subset of the key columns of another index and has non-key column subset of non-key columns of another index are overlapping indexes. The key columns in the overlapping index must be left-ordered, meaning that they are listed in order of decreasing importance, with the most important column appearing first. This is because SQL Server can use an index for a range scan only if the leftmost column is specified, and then only if the next leftmost column is specified, and so on.
    Eg:
    CREATE INDEX idx1 ON MyTable (ColumnA, ColumnB) INCLUDE (ColumnX, ColumnY);
    CREATE INDEX idx2 ON MyTable (ColumnA, ColumnB, ColumnD) INCLUDE (ColumnX, ColumnY, ColumnZ);
    In this example, index “idx1” has key columns “ColumnA” and “ColumnB”, which is a left subset of the key columns “ColumnA”, “ColumnB” in index “idx2”. Therefore, “idx2” overlaps “idx1” and these two indexes are overlapping indexes.
  • Similar Indexes: Indexes that have identical key columns in the same order, but different included columns. To solve for similar indexes, one index should be updated to contain include columns of both redundant indexes.
    Eg:
    CREATE INDEX idx1 ON MyTable (ColumnA, ColumnB) INCLUDE (ColumnX, ColumnY);
    CREATE INDEX idx2 ON MyTable (ColumnA, ColumnB) INCLUDE (ColumnP, ColumnQ);
    Solution Index:
    CREATE INDEX idx1 ON MyTable (ColumnA, ColumnB) INCLUDE (ColumnX, ColumnY, ColumnP, ColumnQ);
    In this example, index “idx2” has key columns “ColumnA” and “ColumnB”, which is identical to “idx1”. The Include columns are however different, and the resulting index must contain include columns as distinct union of both index include columns.

Uncovering the Hidden Treasure of Redundant Indexes

The Hunt for Efficiency in SQL Server

As we delved deeper into the mysterious world of SQL Server databases, we discovered a strange phenomenon — unused indexes! These indexes were like the ghosts of databases past, haunting our storage and gobbling up our precious computing power with unnecessary index updates.

We noticed some serious issues with our big databases — our CPUs and Data IO were frequently spiking, which is never a good sign for a healthy database. The culprits were mainly pesky queries, but in a few cases, the queries seemed completely innocent and the indexes looked fine. After diving deeper, we discovered that the real problem was multiple unused indexes on primary columns. These redundant indexes were causing an excessive amount of updates and creating multiple query plans, thus wreaking havoc on our database performance.

So, we began our quest to delete these redundant indexes, one by one. But alas, the process of deleting redundant indexes turned out to be a real snooze-fest. As a database administrator, hunting down redundant indexes is like playing a real-life game of Where’s Waldo. You have to sift through a sea of SQL scripts, execute a script that lists all indexes and their indexed and included columns, and then comb through each individual index to identify the redundant indexes. The irony is not lost on us — we’re trying to boost database efficiency by doing something that’s incredibly inefficient. And just when you think you’ve identified the redundant indexes, you still have to delete them one by one, which can make you second-guess your career choice and dream of trading in your laptop for a beach chair and a piña colada.

And Oh boy, we were in for a treat — turns out our database problems were just getting started! We figured out that getting rid of useless indexes was pretty low on the list of priorities for our dear DBAs. It was like trying to get them to give up their beloved caffeine fix — near impossible! But we knew we had to make the process more efficient if we wanted to have any hope of fixing our database woes.

But fear not, for we remembered our trusty sidekick, snorql! With its help, we created a supercharged optimization metric — the Index Redundancy Metric. Thanks to this handy tool, we were able to streamline the process and get rid of those useless indexes in no time flat. This bad boy made DBAs so fast, they could make a cheetah look like a sloth.

DBAs at udaan using Index Redundancy Analysis Tool on snorql

What is snorql?

diagnosing databases like a boss!

Developed at udaan, snorql is an open-source and completely free-to-use framework aimed at diagnosing, resolving, and optimizing SQL metrics. snorql is pluggable and can be easily implemented with any database and is here to make your life easier. Get started with our easy-to-follow guide on “Getting started with snorql”.

Snorql Index Redundancy Analysis at database table level

But that’s not all — Snorql also identifies unused tables and unused indexes, giving you even more insight into potential space savings. With Snorql’s optimisation metrics, you can confidently make informed decisions about which indexes and tables to remove, reducing clutter and improving organisation in your database.

Snorql Repositories:
1. snorql (framework): https://github.com/udaan-com/snorql
2. snorql-frontend: https://github.com/udaan-com/snorql-frontend

In case you need further convincing, be sure to check out the fantastic article snorql — diagnosing databases like a boss

Snorql Index Redundancy Analysis for a particular table

Indexing Follies

Lesson Learned in the Quest for Database Efficiency

To tackle redundant indexes, we came up with an algorithm to classify our indexes. Sounds simple enough, right? Just compare and contrast, and voila — unused, duplicate, overlapping, and similar indexes will be neatly sorted. But hold your horses, my friend — this was no walk in the park! It took multiple iterations to perfect our metric, and we gained valuable insights along the way.

  1. Inviting the Read-Replica and Geo-Replica databases to the party:
    We realised that just because an index exists in the primary database, it doesn’t mean it’s being treated the same way in the Read-Replica and Geo-Replica instances. And to make things even more complicated, an index that’s being used heavily in the Read-Replica or Geo-Replica database might be marked as an unused index in the primary database. So, to get accurate statistics for our index classification algorithm, we had to gather and summarise both the Index Usage and Index Updates across all instances.
  2. Making sure Unique Indexes stay in the party:
    So, here’s the deal — unique indexes and UNIQUE constraints are like siblings who enforce uniqueness in the same way. When you create a UNIQUE constraint, SQL Server is like a magician who creates a unique index out of thin air. And because of this sorcery, you can’t just drop unique indexes directly from the database. So, to avoid any mishaps, we had to exclude unique indexes from our classification algorithm.
  3. How not to do UX:
    In the first version of our Index Redundancy Analysis, we tried grouping redundant indexes at the index level. It made it hard to understand the relationships between parent and child indexes without looking at the whole table. Plus, the algorithm we developed was as complicated as a game of 3D chess, and we quickly realised this was not the way to go.
Version 1 of Index Redundancy Analysis

4. Identifying Unused Tables
Thanks to index statistics, we were able to determine which tables in our database were going unused. If we found that all indexes in a particular table were going unused, we marked it as “unused” — simple as that!

5. How to Handle Similar Indexes
So, imagine you have two friends who look alike, but one of them has a cool hat and the other has a funky pair of glasses. They’re like similar indexes — duplicates or overlapping for indexed columns, but with different include columns. But here’s the catch — you can’t just swap them out like a pair of shoes without causing some serious performance issues. Deleting them is not the answer either — it’s like breaking up with one of your friends just because they both have the same name! The best approach is to update one of the indexes to have all the include columns from both, and then wave goodbye to the other one. This way, we can save up on similar indexes and keep the peace between our database friends.

Similar Indexes in SQL Server Database

Alright folks, it’s time to roll up our sleeves and get down and dirty! We’ve identified those pesky edge cases, and now it’s time to dive headfirst into the implementation phase.

So, who’s ready to get their hands dirty? Let’s do this thing!

Diving into the Index Redundancy Metric

Its time to focus on the Magician behind this, the Algorithm!

Note: Index Redundancy Analysis is currently available for SQL Server databases, and the implementation is specific to this database system.

Github Issue ✅ #79 New Metric — Index Redundancy Metric
Github PR ⛓ #84 [New metric] Index Redundancy Metric

I have broken down the algorithm into steps:

  1. Get detailed data of all indexes in a database using the below sql query

2. If a Read Replica/Geo Replica exists, then summarise the Index Updates and Index Usage statistics

3. Group indexes by table, and iterate on each table, Filter indexes whose name == NULL to filter Heap Indexes, and sort in descending order by the number of indexed columns.
We also maintain a list of indexes that are to be skipped. These include the indexes which are already classified or unique indexes.

4. Identify Unused Indexes. Here, if the usage is below 10 we consider it as an unused index. We have kept this small threshold because it might be possible that an index gets used while running ad-hoc queries.

5. Identifying Unique Indexes. Unique indexes are created intentionally on columns to maintain uniqueness, hence we skip classifying these in analysis. We do show unique indexes at table level for better analysis and visibility.

6. Post this, we iterate on each index, and analyse it to find redundant indexes

a. Classify Duplicate Indexes:
Classifying duplicate indexes is straight forward. The Indexed columns and Include Columns need to be the same and the indexes columns must be in the same order

b. Classify Overlapping Indexes:
The Child Index Indexed columns should be a Left based Subset of Parent Indexed columns in the same order, and the include columns should be the same.

c. Classify Similar Indexes:
The indexed columns should be identical, whereas the include columns can be different

What udaan got out of this?

More Savings, Less Stress, and Happy DBAs!

Turns out the Index Redundancy Analysis wasn’t just a treat for the DBAs, but for the whole organization. We’ve got some juicy benefits to brag about:

  1. Cha-Ching! We saved some bucks by downsizing our databases without any performance hiccups. Our peak database compute on average dropped by 8% and storage shrank by 10%. For instance, we slimmed down a 32 vCore database to a 24 vCore one, and bam, we saved 22% on costs! Who needs all those extra cores anyway?

2. Our apps are lightning fast now! Queries are zipping through with lightning speed, thanks to the Index Redundancy Analysis. Better performance means happier users, and happier users mean more cash in our pockets. It’s a win-win!

3. The DBAs are over the moon. We’ve even heard rumours of them smooching their monitors with the Index Redundancy Metric open on the screen. Hey, we’re not judging — if that’s what it takes to make them happy, so be it!

--

--

Prakul Jain
engineering-udaan

Software Engineer @ zomato | Trying not to be a Code Monkey