How to use Indexing Effectively

Published in

Nerd For Tech

6 min readFeb 15, 2021

What is Indexing?

Indexing is a way to optimize the performance of a database by minimizing the number of disk accesses required and steps when a query is processed. It is a data structure technique that is used to quickly locate and access the data in a database.

Indexing methods

Clustered Indexing

When more than two records are stored in the same file (page) these types of stores are known as cluster indexing. By using cluster indexing we can reduce the cost of searching reason being multiple records related to the same thing are stored in one place, it also gives the frequent joining of more than two tables(records) and reduces the io cost.

Non-clustered or Secondary Indexing

A non-clustered index just tells us where the data lies, i.e. it gives us a list of virtual pointers or references to the location where the data is actually stored. Data is not physically stored in the order of the index. Instead, data is present in leaf nodes. The actual data is not organized but we have an ordered reference to where the data points actually lie. We can have only dense ordering in the non-clustered index as sparse ordering is not possible because data is not physically organized accordingly.
It requires more time as compared to the clustered index because some amount of extra work is done to extract the data by further following the pointer. In the case of a clustered index, data is directly present in front of the index.

Multilevel Indexing

With the growth of the size of the database, we will need to divide the index and using a different approach. As the index is stored in the main memory, a single-level index might become too large a size to store with multiple disk accesses. The multilevel indexing segregates the main block into various smaller blocks so that the same can be stored in a single block. The outer blocks are divided into inner blocks which in turn are pointed to the data blocks. This can be easily stored in the main memory with fewer overheads.

How does Indexing work?

In reality, the database table does not reorder itself every time the query conditions change to optimize the query performance: that would be unrealistic. In actuality, what happens is the index causes the database to create a data structure. The data structure type is very likely a B-Tree. While the advantages of the B-Tree are numerous, the main advantage for our purposes is that it is sortable. When the data structure is sorted in order it makes our search more efficient for the obvious reasons we pointed out above.

When the index creates a data structure on a specific column it is important to note that no other column is stored in the data structure. Our data structure for the table above will only contain ids.

B+ trees are generally employed. The drawback of the B-tree used for indexing, however, is that it stores the data pointer, corresponding to a particular key value, along with that key value in the node of a B-tree. This technique, greatly reduces the number of entries that can be packed into a node of a B-tree, thereby contributing to the increase in the number of levels in the B-tree, hence increasing the search time of a record.

Indexing with Write statements?

The number of indexes on a table is the most dominant factor for insert performance. The more indexes a table has, the slower the execution becomes. The write statement is the only operation that cannot directly benefit from indexing because it has no where clause.

Adding a new row to a table involves several steps.

First, the database must find a place to store the row. For a regular heap table that has no particular row order, the database can take any table block that has enough free space. This is a very simple and quick process, mostly executed in the main memory. All the database has to do afterward is to add the new entry to the respective data block.

second, If there are indexes on the table, the database must make sure the new entry is also found via these indexes. For this reason, it has to add the new entry to every index on that table. The number of indexes is, therefore, a multiplier for the cost of a write statement.

Moreover, adding an entry to an index is much more expensive than inserting one into a heap structure because the database has to keep the index order and tree balance. That means the new entry cannot be written to any block it belongs to a specific leaf node. Although the database uses the index tree itself to find the correct leaf node, it still has to read a few index blocks for the tree traversal.

Once the correct leaf node has been identified, the database confirms that there is enough free space left in this node. If not, the database splits the leaf node and distributes the entries between the old and a new node. This process also affects the reference in the corresponding branch node as that must be duplicated as well. Needless to say, the branch node can run out of space as well so it might have to be split too. In the worst case, the database has to split all nodes up to the root node. This is the only case in which the tree gains an additional layer and grows in depth.

Best practice with using Indexing

Avoid using too many indexes

if you have an index on (A, B, C, D, E) and you restrict your query by values of B and D, that index is totally useless. It’s only useful if you query by all five columns frequently or by combinations like (A, B), (A, B, C), (A, B, C, D) frequently In any other case, it’s a waste.

Avoid using few indexes

If indexes are not properly created, SQL Server has to go through more records to retrieve the data requested by a query. Therefore, it uses more hardware resources (processor, memory, disk, and network), and obtaining the data lasts longer.

Avoid using an index with `%` statements

when you use where `%` it forces the database to make a sequential search on all the records to get the matched records so the indexes will be no used in the case

INDEX TO AVOID SORTING (GROUP BY, ORDER BY)

In addition to building indexes to optimize data access, indexes can be used to avoid sorting. The GROUP BY and ORDER BY clauses tend to invoke sorts, which can cause performance slowdowns. By indexing on the columns specified in these clauses, the relational optimizer can use an index to avoid a sort, and thereby potentially improve performance.

CREATE INDEXES FOR FOREIGN KEYS

Creating indexes for each foreign key can optimize the performance when accessing and enforcing referential constraints, Most database systems do not require such indexes, but they can improve performance.