How SQL Indexing improves efficiency of select queries ?

Published in

Javarevisited

6 min readMay 7, 2023

In this blog I will talk about how SQL indexes improves the performance of search queries. I will use MySql to illustrate here.

Firstly, lets create a table —

CREATE TABLE PERSON (
 id int,
 first_name varchar(50),
 last_name varchar(50),
 age int,
 salary int
);

Now adding some sample data —

INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(1, 'Sam', 'Smith', 29, 30000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(2, 'Tony', 'Stark', 30, 38000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(3, 'Peter', 'Parker', 26, 40000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(4, 'Bruce', 'Banner', 38, 20000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(5, 'Steven', 'Strange', 39, 32000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(6, 'Natasha', 'Romanoff', 32, 41000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(7, 'Steve', 'Roggers', 45, 44000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(8, 'Bruce', 'Wayne', 39, 38000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(9, 'Harry', 'Potter', 23, 20000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(10, 'Harry', 'Styles', 24, 24000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(11, 'Harry', 'Smith', 22, 29000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(12, 'Clark', 'Kent', 29, 39000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(13, 'Hugh', 'Jackman', 39, 50000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(14, 'Barry', 'Alen', 28, 33000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(15, 'Tom', 'Hanks', 31, 60000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(16, 'Tom', 'Cruise', 38, 63000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(17, 'Tom', 'Holland', 34, 35000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(18, 'Tom', 'Hardy', 41, 39000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(19, 'Carol', 'Danvers', 43, 41000);
INSERT INTO PERSON(id, first_name, last_name, age, salary) VALUES(20, 'Jake', 'Smith', 48, 43000);

Let’s run a search query where we want to fetch all the people with name ‘Harry’ —

There are two columns here that we need to observe —

key — Index used by SQL engine at the time of executing the query. And we can clearly see no index was used for this query.
rows — It tells the number of rows that were analysed by the query. We can see here 20, which means SQL did full table scan to fetch the records based on the query. Consider a table with million records, now this approach would impact the performance.

Let’s create our first index on first_name as this is our search criteria.

CREATE INDEX first_name_idx on PERSON(first_name);

Now let’s run our query again and see if there is any difference —

If we notice, we will observe that MySql has used the index we created while executing the query. Under the column rows we can see ‘3’ which depicts only 3 rows were scanned while fetching the data.

Let’s run another query —

We can see that MySql has again did a whole table scan for this search query, so it is really necessary to list down the possible search criteria that are going to be used in our application so that we can create the necessary indexes to optimise the performance of our system.

CREATE INDEX age_idx on PERSON(age);

Here is an interesting outcome, our age_idx is listed in possible_keys but under the key column we see the value as NULL and under the rows column the value is 20, SQL engine has decided not to use index here so if our query returns significant portion of the table, the SQL engine would probably not use index as the whole table scan might be quicker than using the index.

Fig 5 shows SQL engine has used the age_idx and scanned only 9 rows instead of scanning the whole table.

Composite indexes or multi-column indexes —

What if we have multiple columns in where clause ? To further optimise our queries we can create composite indexes or multi-column indexes.

CREATE INDEX full_name_idx on PERSON(first_name, last_name);

We observe that under possible_keys, two of our indexes are listed and under the column key our composite index was used and only one row was scanned to fetch the result, this shows how composite index can further optimise our queries.

DROP INDEX first_name_idx on PERSON;

I have run two queries now, in the first query I am searching based on last_name and in the second query I am searching based on first_name. In the first query our index was not used and SQL engine did a whole table scan but in the second query our index full_name_idx was used and only one row was scanned. This shows the order of columns being used in composite index matters while querying partial index.

For e.g — If we have created a composite index on columns (x, y, z) then our index will be not be invoked while searching based on (z) or (y, z) alone. The query will only use the composite index if it has the left most column of composite index in the where clause.

Since our first column of composite index is present in both where clause so our full_name_idx was invoked both times.

How index stores data ?

Let’s talk about what actually happens behind the scenes whenever we create an index in our database. In simple terms, an index is like a “Table of Contents” of a book. It can tell exactly how to reach to a particular chapter simply going to that page number directly.

An index is basically a data structure typically a B-Tree that holds the column on which it it created and the address of the rows. An index stores data in a sorted manner, so our index “first_name_idx” may look like this —

Barry --> 14
Bruce --> 4
Bruce --> 8
Carol --> 19
Clark --> 12
Harry --> 9
Harry --> 10
Harry --> 11
Hugh --> 13
Jake --> 20
Natasha --> 6
Peter --> 3
Sam --> 1
Steve --> 7
Steven --> 5
Tom --> 15 
Tom --> 16
Tom --> 17
Tom --> 18
Tony --> 2

Note — Composite index will store data in a similar fashion, 1st level of sorting will be on first_name and second level of sorting will be on last_name.

How SQL Indexing improves efficiency of select queries ?

Written by Mayank Jain