Database Indexes, simplified

Serhii Shkarupa
5 min readNov 24, 2021

--

Lets speak about indexes in the professional programming. I want to tell you the principles of using indexes, why you have to use them, when, how. No difficult technical things, you can read it in the official documentation or other articles.

MySQL examples will be used, because this DBMS is the most popular.

Just for preview

Why to use indexes?

I don’t remember where i heard this example, but example with book library is perfect! Example: librarian will not check all books in his library, when he received a request to find the “Harry Potter” book. He will go to the bookshelf with letter “H”, then start looking for a books with name “Ha”, then look for “Har” and each time add one more letter, till find the searching book.

This example if very simplified, but the main idea is it. DBMS can check 3 records with indexes or can check the whole table. If your table has 3 records — no worry, but if 3 million — worry. Indexes will simplify the search significantly!

When to use indexes?

Always if a field is in WHERE or ORDER BY statements.

Example:

SELECT * FROM products WHERE category_id = 7 and status=’subscribe me pls’ ORDER BY created_at

Fields category_id and status, created_at have to be indexed.

How to use indexes?

We have to use separate index for each searching case.

Example of query above. DBMS will search products by the category_id and status fields + sort them by the created_at field, this is one big search case. So we have to create indexes for 3 fields.

MySQL version of indexes:

ALTER TABLE products
ADD INDEX category_idx (category_id, status, created_at) USING BTREE;

Immediately hint: name your indexes by fields + _idx

We usually have several queries to one table. Example above shows the case with search by category and status, lets say we have another search in additional to existing one:

SELECT * FROM products WHERE user_id = 12 ORDER BY created_at;

This is another search case, which require another index, because we have to search by field, which was not indexed. It is very simple logic, if you search by field, which is not indexed — index it!

ALTER TABLE products
ADD INDEX user_id_idx (user_id, created_at) USING BTREE;

And yes — we have to add the created_at field for sorting to each index if we want to sort

A few words about combined indexes

There is nothing special if you use a simple index for one field. But if you are using combined indexes, like examples above — you can use them in other search cases, but the order of fields take the major thing

This query will work perfect, because the category_id is the first field in our index list(here is it: category_id, status, created_at) and the status field is the next of category_id. So MySQL engine can find everything what was needed by category_id, then sort by the next field from the index list.

SELECT * FROM products WHERE category_id = 7 ORDER BY status;

So as you can understand, the path to your record is: first field -> second field -> third field -> … Check the next query, you can find that we skipped the second field from the query and search by first and order by third

SELECT * FROM products WHERE category_id = 7 ORDER BY created_at;

Index will work partially: quick WHERE and slow ORDER BY, because we have category_id as the first field in our indexes, but we don’t have any created_at after the category_id. Our index has the status field between them and it destroy the link between category_id and created_at. Original query with all 3 fields will work much quicker than this one, even we search by 1 field less. Be smart, when you choose order of your fields!

How to explain?

You want to check if the index works, you want to know if you need an index, you want to know what fields have to be included to the index, finally you want to know why this query is so sloooow. Meet EXPLAIN statement.

EXPLAIN is the key feature to work with indexes. Simple add the EXPLAIN word before SELECT statement

EXPLAIN SELECT * FROM products WHERE category_id = 7 and status = 'active' ORDER BY created_at;
The result WITHOUT indexes
The result WITH indexes

You want to check the type, possible_keys, key, ref, Extra fields. The goal to see if the category_idx is used for our query. Without index you can see the next in the Extra field

Using where; Using filesort

This means, that MySQL used temporary file to make a sort, these words are synonyms to very slow. Also you can see type=All, which means checking all records in the table, query time will grow with table size(count of records).

B-Tree vs Hash

Simple:

  • B-Tree, if you want use > or < or ≥ or ≤ or LIKE statement in additional to =, operator = works slower than hash
  • Hash, if you use only = or !=, works fast

Link to the MySQL documentation, there are more details

How much do indexes cost?

Of course all these speedups are not free.

  • Decrease speed during INSERT / UPDATE / DELETE — nothing special in almost all cases
  • Increasing size of your database, sometimes volume of indexes can be more than data in this table. Take as a notice and remember it

P.S.

The create table from example

CREATE TABLE `products` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`category_id` int(11) NOT NULL,
`status` varchar(25) NOT NULL,
`user_id` int(11) NOT NULL,
`name` varchar(64) NOT NULL,
`description` varchar(255) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB

--

--