Everything you Need to Know About Index in SQL Server

8 min readDec 12, 2023

https://www.sqlservercentral.com/articles/understanding-curd-operations-on-tables-with-b-tree-indexes-page-splits-and-fragmentation

This article aims to simplify the topic of indexes in SQL Server, explaining fundamental concepts and offering practical tips for maintenance.

My commitment is to provide an easy and quick read, simplifying the understanding of this important aspect in database and query management.

Topics

Overview of Indexes
Types of Indexes in SQL Server
- Clustered Indexes.
- Non-Clustered Indexes.
- Columnstore x Rowstore Index
Fragmentation of Indexes
- Internal x External
- Rebuild X Reorganize

Overview of Indexes

An index in SQL Server has the main objective of accelerating the data search process in queries, providing faster and more efficient retrieval.

When an index is created, a structure is created on disk to optimize the location of data in your table or view.

The index stores table keys (fields) in a structure known as B-Tree. This structure allows SQL Server to find the row or field associated with the index more efficiently. When you create an index, you are creating a kind of map on the data that makes it easier to navigate through the data, resulting in faster queries.

An example of B-Tree below:

Efficient queries consume less computational resources, such as CPU and memory.

In the Cloud, do you know what fewer operational resources mean? That’s right, money! — If your query requires a lot of operational resources in the Cloud, this can lead to high and unexpected charges, which is why maintaining and creating an index can help you.

But… Which index to create?

Types of Indexes in SQL Server

SQL Server has a wide variety of indexes available for creation, in this article I will cover:

Clustered
Non-Clustered
Columnstore x Rowstore

Clustered Index

Clustered indexes (CLUSTERED INDEX) are by insertion order, they determine the physical order of data in the table. It will create an ordered structure (DESC or ASC) to facilitate searches in the B-Tree. New data is inserted maintaining the sequence defined by the index, either in ascending (ASC) or descending (DESC) order.

Imagine a book and its pages, it’s the same concept. When you put DESC it turns into a manga… rs

Key Points:

Each table has only one clustered index
When we create a Primary Key on the table, a clustered index is created.
The columns in the clustered index are arranged in the order specified during index creation
You cannot use INCLUDE to add more fields

Syntax Example:

CREATE CLUSTERED INDEX cix_your_table_id
ON your_table(id);

Non-Clustered Index

Here the magic happens, non-clustered indexes (NONCLUSTERED INDEX) point directly to the information. Each row will have the non-clustered key value and a locator for the row.

A good example is the index of a book, in the index you have exactly where the information you want is located.

Key Points:

There can be multiple non-clustered indexes on a table.
A non-clustered index does not affect the physical order of data in the table.
Each non-clustered index contains a list of pointers to the corresponding rows of data.
You can add INCLUDE with other fields or several fields in the same index

Syntax Example:

CREATE NONCLUSTERED INDEX ncix_your_table_date 
ON your_table(date ASC);


CREATE NONCLUSTERED INDEX ncix_your_table_date 
ON your_table(date DESC);

Syntax Example with Multiple Columns:

CREATE NONCLUSTERED INDEX ncix_your_table_name_date
ON your_table(name, date);

If your query uses [name] and [date] in WHERE or ORDER BY, it is good for performance to add the fields.
If queries only need the [name] and [date] columns, this index can serve these queries without the need to access the main table.

Syntax Example with INCLUDE:

CREATE NONCLUSTERED INDEX ncix_your_table_name_include 
ON your_table(name)
INCLUDE (phone);

Instead of creating an index for each column, you can perform an INCLUDE if the fields are in the SELECT.
As the [name] and [phone] columns are in the index, queries can be served directly from the index, avoiding unnecessary access to the main table.
If frequent queries require combining information from the [name] column with the [phone] column, this index can optimize the performance of these queries.

Columnstore x Rowstore Index

Briefly, the ROWSTORE indexes are those mentioned above, ROWSTORE indexes are recommended for transactional systems that use a lot of CRUD operations. COLUMNSTORE is more suitable for large data loads, such as a DW as well.

When we use COLUMNSTORE indexes the data is compressed and stored in columns. That is, they are logically organized as a standard table of rows and columns, whereas physically, they are stored in a data format with a pointer to the column and not the row.

Its structure will point directly to the column itself, unlike ROWSTORE which points to the row.

Key Points:

Each table accepts only one COLUMNSTORE index
Each column is stored and managed independently.
Designed for analytical queries and aggregations on large volumes of data (DW, OLAP, etc…)
It is well recommended for tables and queries that use a lot of aggregation functions (SUM, AVG, MIN, MAX)
Columnstore performs data compression, similar values in a column are stored efficiently, reducing the required disk space.

Syntax Example CLUSTERED:

CREATE CLUSTERED COLUMNSTORE INDEX cix_your_table_id ON your_table
WITH (DROP_EXISTING = ON);

This example uses WITH (DROP_EXISTING = ON); With WITH I ask it to remove my CLUSTERED index and create this COLUMNSTORE in its place

Syntax Example NONCLUSTERED:

CREATE NONCLUSTERED COLUMNSTORE INDEX ncix_your_table_name_date
ON your_table(name, date);

You can use WITH (DROP_EXISTING = ON), if you find it necessary

To continue, this is the database model that I will use as an example throughout the article:

CREATE DATABASE [your_db]

USE [your_db]
GO

CREATE TABLE [your_table]
( 
 [id] [int] IDENTITY(1,1) NOT NULL,
 [date] [datetime] NOT NULL,
 [name] [varchar](50) NOT NULL,
 [phone] [nvarchar](14) NULL
)

CREATE CLUSTERED INDEX cix_your_table_id
ON your_table(id DESC);

CREATE NONCLUSTERED INDEX ncix_your_table_name_date
ON your_table(name DESC, date DESC);

Fragmentation of Indexes

Maintaining an index starts with understanding its fragmentation. When we create an index, its fragmentation starts at 0%, but as it is used, fragmentation increases, resulting in performance losses.

CRUD operations are responsible for fragmentation, with INSERT being the biggest villain.

To check the level of fragmentation, use the query below:

SELECT
    OBJECT_NAME(B.object_id) AS TableName,
    B.name AS IndexName,
    A.index_type_desc AS IndexType,
    A.avg_fragmentation_in_percent
FROM
    sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED')    A
    INNER JOIN sys.indexes                            B    WITH(NOLOCK) ON B.object_id = A.object_id AND B.index_id = A.index_id
WHERE
    OBJECT_NAME(B.object_id) NOT LIKE '[_]%'
    AND A.index_type_desc != 'HEAP'
ORDER BY
    A.avg_fragmentation_in_percent DESC

This is my result (No operations so far):

Source: Author

How about we insert 100,000 rows?


DECLARE @RowCount INT = 100000;

DECLARE @Counter INT = 1;

WHILE @Counter <= @RowCount
BEGIN
 
    INSERT INTO [your_table] ([date], [name], [phone])
    VALUES 
    (
        DATEADD(DAY, -ABS(CHECKSUM(NEWID())) % 365, GETDATE()), -
        'Name' + CAST(@Counter AS VARCHAR(5)), 
        '(' + CAST(ABS(CHECKSUM(NEWID())) % 9000000000 + 1000000000 AS VARCHAR(14)) + ')'
    );
   
    SET @Counter = @Counter + 1;
END;

Notice the fragmentation now:

Source: Author

Crazy, right? ;)

Internal x External

Well, fragmentation can be both internal and external, these two types refer to the distribution of data in its storage structure. That is, how they are organized.

High fragmentation = Index is disorganized
Disorganized index = Slowness in your query
Slowness in your query = Operational resources
Operational resources = $$$

Internal Fragmentation:

Occurs when there is unused space within a storage page.
Result of allocating space in fixed blocks, leading to unused spaces when a block is not completely filled.
It can result in wasted space and an increase in the physical size occupied by data.
Example: In a file management system that has allocated space in 4KB blocks, if a file occupies 2.5KB, there will be 1.5KB of unused internal fragmentation.

External Fragmentation:

Occurs when the space available for data storage is distributed in dispersed blocks, instead of being contiguous. When the logical order of pages varies from their physical organization in the data files.
It results from insertions, removals or updates of data, leaving gaps.
It may make it difficult to efficiently allocate new data, even if there is enough free space in total.
Example: In a database table, if multiple records are removed, creating gaps, this can result in external fragmentation.

Rebuild X Reorganize

It is exactly at this point that maintenance happens, REBUILD and REORGANIZE are index maintenance operations in relational databases. These operations aim to optimize the performance and efficiency of the indexes.

REBUILD:

When to use?
- After internal or external fragmentation has reached a critical level (Can be used in the example above).
- After major maintenance operations such as bulk data loading (Or bulk insertion).
- When there are changes to the index structure, such as adding or removing columns.
What is the Impact?
- Completely removes fragmentation.
- Recreates the index, resulting in a new physical index.
- May require more resources (CPU, disk space) and block index operations during the process (Do this during off-peak hours).
Example:

ALTER INDEX cix_your_table_id ON [your_table] REBUILD; -- CLUSTERED

ALTER INDEX ncix_your_table_name_date ON [your_table] REBUILD; -- NONCLUSTERED

REORGANIZE:

When to use?
- To deal with light to moderate fragmentation.
- You can perform after some ETL Jobs where a bulk load occurs
- It is a less invasive option than REBUILD and is suitable for keeping the index in “good” condition.
What is the Impact?
- Reorganizes the index pages, but does not completely rebuild the index.
- It is generally more resource efficient and allows the index to remain available for queries during operation.
Example:

ALTER INDEX cix_your_table_id ON [your_table] REORGANIZE; -- CLUSTERED

ALTER INDEX ncix_your_table_name_date ON [your_table] REORGANIZE; -- NONCLUSTERED

Shall we test REORGANIZE first and see the impact?

Source: Author

Good, right? ;)

In many cases it will not decrease as much as in my example. However, REORGANIZE is the best choice for scheduled maintenance. REBUILD may lock your base depending on the size of the table and bank.

Shall we test it with REBUILD now?

Before testing, I will insert the lines again.