Indexes in Relational Databases(Detailed)

Rishabh Jain
4 min readJun 24, 2020

--

Before jumping directly on indexing in relational DBMS. Do you know tables are stored on disk ?

Tables are stored as files in memory. Files in turn consist of continuous blocks of storage. Tables are stored in these blocks.

File consisting of contiguous blocks

Files can store data(records) in two ways:

  1. Sorted way
  2. Unsorted way

Sorted:

With Sorted records , Searching becomes faster. We can also take advantage of Binary Search(O(logn)).

Due to ordered data, insertion becomes pretty slow. Need to identify the correct position and then insert with some shifting.

Unsorted:

With Unsorted records, Insertion becomes easier(insert at any available places). But to search we need to traverse in linear fashion that contribute to time complexity of O(n)

How data mapped into file blocks?

Take a block size of 1KB (1024 B). Start inserting records of size 200B.

After 5 blocks 1000B is used. Now for inserting 6th block, there are 2 ways we can follow:

  1. Store 24B in the first block and remaining 176B in the next block. Also known as Spanned Mapping.
  2. Store full 200B in the next block. Also known as Unspanned Mapping.

Mostly database systems using Unspanned Mapping as it helps in accessing elements quickly.

We will move in an unorthodox way to learn Indexing. First we will learn why indexing? and then we will move to what is Indexing?

Why Indexing?

Take a file with 10000 blocks and each block contains 100 items.

Case 1 : File contains ordered(sorted) data. So here we can use Binary Search to find the block(Finding a block consumes time , find record from block is fast):

Time Taken: log2(10000) = 13.287

Binary Search is not an optimal solution as file size will keep on increasing. We will go through multilevel indexing with B+ trees.

Case 2: File contains unordered data. Time Taken will be O(n) where n = number of blocks.

With ordered or unordered records of files we will not get an optimal solution.

Note: Finding a block where data exists is the actual time taken for search. Data within the block can be searched in negligible time.

What is Indexing?

Indexing is required to access elements from a big database table quickly. Indexing improves the query performance of the table by minimising the running time. Basically it reduces the number of block access to find a solution.

For Indexing a table, internally a separate table/file(index file) is created. This file contains indexed attributes and block references where it exists in the main table.

Size of the Indexed file is very less as compared to the main file as it contains minimal data.

How to optimise Index file size when file contains sorted records?

As records are ordered, we can put the Block id from the main table and its corresponding block reference in the Indexed file. Block id will map to all records present in that block.

With this approach we can reduce the index file size.

Number of entries in Index file = Number of blocks in main table.

This is also called Sparse Indexing(Not all records in Index file)

Here 1–3 id will be taken care by one id =1 , rest by 4

Note: In case of unordered records, in the data file we can’t store the block id, instead we need to simply put all the records in the index file. But still it will more efficient than looking directly on data file.This is called Dense Indexing(All records are present in Index file)

Types of Indexing:

  1. Single level Index
  • Primary Index
  • Secondary Index
  • Clustered Index

2. Multi level Index

Primary Indexing:

Data file is sorted on Primary Key. Here the primary key is used as a search attribute. As the file is sorted it will use sparse indexing internally.

Number of records in index file = Number of blocks in data file

Cluster Indexing

Data file is sorted but on a non primary key. Entries corresponding to non key can be duplicate.

Number of records in index file = Number of unique records based on indexed key.

Secondary Indexing:

Data file in file is unordered and is indexed on any candidate key(unique) or non keys having duplicate entries.

Multilevel Index:

As the data grows in tables then the index table will also grow. So keeping data in a single index file is not efficient as we need to keep that one big file. To keep our indexing work effectively, we can make use of Multilevel Indexing.

Multilevel Index breaks a big index file into multiple small internal and outer indexes. We just need to store a small outer index in main memory.

Multilevel Index

Adding these indexes will reduce the blocks access(traversing blocks to get the desired data).

Multilevel trees in Databases are implemented from B+ trees which are in turn based on m way trees.

Advantages of Indexing:

  1. Faster Access of records / Better Response time.
  2. Primary/ Unique indexes added more power to our tables by adding constraints.

Disadvantages of Indexing:

  1. Indexes take up Disk/Memory.
  2. Decrease the performance with insert, updates and delete. As the tree needs to be rebalanced so performance decreases for above operations.

We will see the detailed explanation of B+ and B tree in next blog.

In case of any query and suggestions please connect with me https://www.linkedin.com/in/rishabh-jain-a5978583/ or drop a mail @rishabh171192@gmail.com

--

--

Rishabh Jain

Full Stack Developer — React, Node, Mongo DB, Postgres, RabbitMQ, AWS, Native Performance Engineering, Lambda, Javascript, Kubernetes, Docker.