Single vs Composite Indexes in Relational Databases

When To Use Single Indexes Versus When to Use Composite Indexes

Single Indexes

In a relational database, an index is a data structure that increases retrieval speed at the expense of decreasing write speed as well as using more storage space.

Querying a database table of n records by a field other than a key, requires O(n) record reads. (Technically, n should be the number of disk blocks used by that table, but for simplicity, lets assume it’s just the number of records).

For example, assuming ssn is a unique field, a query like the one below reads on average n/2 records to find the match.

SELECT * FROM users WHERE users.ssn = '1234';

This is because ssn is unique, so the database can stop when the first record is found.

However, if the field is not unique, a query like the one below reads all n records (aka a full table scan) to find all matches.

SELECT * FROM users WHERE users.first_name = 'Teemo';

A full table read is slow, especially for large tables. This is where indexes can help.

An index, or more specifically, an index on a column is an additional data structure of the table’s records sorted (typically via b-tree) only on that column. Each record in the index also includes a pointer to the original record in the table, such that finding records in the index is equivalent to finding records in the original table.

For example, suppose we had the following users table.

ID | first_name | last_name    | Class      | Position |  ssn | 
---------------------------------------------------------------
1 | Teemo | Shroomer | Specialist | Top | 2345 |
2 | Cecil | Heimerdinger | Specialist | Mid | 5461 |
3 | Annie | Hastur | Mage | Mid | 8784 |
4 | Fiora | Laurent | Slayer | Top | 7867 |
5 | Garen | Crownguard | Fighter | Top | 4579 |
6 | Malcolm | Graves | Specialist | ADC | 4578 |
7 | Irelia | Lito | Figher | Top | 5689 |
8 | Janna | Windforce | Controller | Support | 4580 |
9 | Jarvan | Lightshield | Figher | Top | 4579 |
10 | Katarina | DuCouteau | Assassin | Mid | 5608 |
11 | Kayle | Hex | Specialist | Top | 4794 |
12 | Emilia | LeBlanc | Mage | Mid | 3468 |
13 | Lee | Sin | Fighter | Jungle | 8085 |
14 | Lux | Crownguard | Mage | Mid | 4567 |
15 | Sarah | Fortune | Marksman | ADC | 6560 |
16 | Morgana | Hex | Controller | Support | 3457 |
17 | Orianna | Reveck | Mage | Mid | 9282 |
18 | Sona | Buvelle | Controller | Support | 4722 |
19 | Jericho | Swain | Mage | Mid | 5489 |
20 | Shauna | Vayne | Marksman | ADC | 2352 |
21 | Xin | Zhao | Fighter | Jungle | 6902 |
22 | Yorick | Mori | Tank | Top | 4840 |
23 | Wu | Kong | Fighter | Jungle | 4933 |

If we create an index on users.first_name

CREATE INDEX first_name_index ON users (first_name) USING BTREE;

it would create a sorting of the users by their first_name with a pointer to their primary key, something like this:

Annie    -> 3
Cecil -> 2
Emilia -> 12
Fiora -> 4
Garen -> 5
Irelia -> 7
Janna -> 8
Jarvan -> 9
Jericho -> 19
Katarina -> 10
Kayle -> 11
Lee -> 13
Lux -> 14
Malcolm -> 6
Morgana -> 16
Orianna -> 17
Sarah -> 15
Shauna -> 20
Sona -> 18
Teemo -> 1
Wu -> 23
Xin -> 21
Yorick -> 22

Then a query like

SELECT * FROM users WHERE first_name = 'Teemo';

would take only O(log_2(n)) reads since the database can perform a binary search on this index, since it is sorted by first_name .

Enforcing Uniqueness with Indexes

In addition to performance gains, indexes can also be used to enforce uniqueness of fields. For example, suppose we don’t want more than one user to have the same social security number field on a table. Creating an index with a UNIQUE modifier

CREATE UNIQUE INDEX ssn_index ON users (ssn);

will raise an error when another record is created with an ssn that already exists in the table.

Avoid Unnecessary Indexes

As per the no free lunch principle, index-based performance gains don’t come free. Its costs come in the form of:

  1. Additional storage space to store indexes
  2. Indexes also need to be updated when state-changing queries like CREATE UPDATE and DELETE are made

As such, adding unnecessary indexes can actually degrade performance overall. Here are some guidelines for when to use indexes:

  • Do not use an index for low-read but high-write tables. As mentioned previously, indexes improve read performance but degrade write performance.
  • Do not use an index if the field has low cardinality, the number of distinct values in that field. The whole idea behind the O(log_2(n)) query performance is that we can roughly eliminate half the records, with each binary search step. But that’s only the case with high cardinality fields.
  • Do not use an index for small fixed-size tables. Small tables will see no noticeable performance gains with indexes. Note that some tables may be small now, but they are likely to grow large over time, like users. Other tables are small and will stay small like ice_cream_flavors , because, come on, even Baskin-Robbins only has 31 flavors.

Composite Indexes

Like a single index, a composite index is also a data structure of records sorted on something. But unlike a single index, that something is not a field, but a concatenation of multiple fields.

For example, returning to our users table example:

ID | first_name | last_name    | class      | position |
--------------------------------------------------------
1 | Teemo | Shroomer | Specialist | Top |
2 | Cecil | Heimerdinger | Specialist | Mid |
3 | Annie | Hastur | Mage | Mid |
4 | Fiora | Laurent | Slayer | Top |
5 | Garen | Crownguard | Fighter | Top |
6 | Malcolm | Graves | Specialist | ADC |
7 | Irelia | Lito | Figher | Top |
8 | Janna | Windforce | Controller | Support |
9 | Jarvan | Lightshield | Figher | Top |
10 | Katarina | DuCouteau | Assassin | Mid |
11 | Kayle | Hex | Specialist | Top |
12 | Emilia | LeBlanc | Mage | Mid |
13 | Lee | Sin | Fighter | Jungle |
14 | Lux | Crownguard | Mage | Mid |
15 | Sarah | Fortune | Marksman | ADC |
16 | Morgana | Hex | Controller | Support |
17 | Orianna | Reveck | Mage | Mid |
18 | Sona | Buvelle | Controller | Support |
19 | Jericho | Swain | Mage | Mid |
20 | Shauna | Vayne | Marksman | ADC |
21 | Xin | Zhao | Fighter | Jungle |
22 | Yorick | Mori | Tank | Top |
23 | Wu | Kong | Fighter | Jungle |

Creating a composite index on the class and position columns

CREATE INDEX class_pos_index ON users (class, position);

will create a composite index, sorted by a concatenation of those fields, something like:

class-position       Primary Key
--------------------------------
AssassinMid -> 10
ControllerSupport -> 16
ControllerSupport -> 18
ControllerSupport -> 8
FigherTop -> 7
FigherTop -> 9
FighterJungle -> 13
FighterJungle -> 21
FighterJungle -> 23
FighterTop -> 5
MageMid -> 12
MageMid -> 14
MageMid -> 17
MageMid -> 19
MageMid -> 3
MarksmanADC -> 15
MarksmanADC -> 20
SlayerTop -> 4
SpecialistADC -> 6
SpecialistMid -> 2
SpecialistTop -> 1
SpecialistTop -> 11
TankTop -> 22

Given this composite index, a query like

SELECT * FROM users 
WHERE
class = 'Specialist'
AND
position = 'Top';

will have improved retrieval time, because the composite index is sorted by class-position . The database can find SpecialistTop in O(log_2(n)) reads rather than a full table read .

Note that even a query on just the class field

SELECT * FROM users WHERE class = 'Specialist';

will benefit from this composite index since class is the first field. Searching the index that is sorted by class is more or less equivalent to searching a composite index sorted by class-position .

However, a query like

SELECT * FROM users WHERE position = 'Top';

will NOT benefit from this composite index because position is the second field. A composite index sorted by class-position cannot be used to quickly find a record by position .

As such, the order of the constituent columns that comprise a composite index is important. The rule is if a composite index is made on column1 , column2 , column3 ,…, columnN . Then queries like

SELECT * FROM table 
WHERE column1 = 'value';
SELECT * FROM table 
WHERE column1 = 'value1'
AND column2 = 'value2';
SELECT * FROM table 
WHERE column1 = 'value1'
AND column2 = 'value2'
AND column3 = 'value3'
...
SELECT * FROM table 
WHERE column1 = 'value1'
AND column2 = 'value2'
AND column3 = 'value3'
...
AND columnN = 'valueN'

will have performance gains.

Guidelines for determining composite indexes

Like single indexes, composite indexes also come with the cost of slower write speeds and increased storage space. The choices of which fields should comprise a composite index and how to order these fields can be determined by considering the following:

  • If certain fields tend to appear together in queries, then it’s a good idea to create a composite index on them. For example, in our users example table above, it’s probably a good idea to create a composite index on (last_name, first_name) .
  • If we’re going to create an index on field1 but also create an composite index on (field1, field2) . Then creating just the composite index on (field1, field2) is enough since it can be used for querying by field1 alone.
  • Similar to single indexes, the cardinality of the fields matters to the effectiveness composite indexes. Obviously, if two fields have high cardinality, a composite index of those fields will also have high cardinality. But it’s also possible that several low cardinality fields can together create a high cardinality composite index depending on the distribution of values among those fields.

Enforcing Unique Combinations With Composite Indexes

Composite indexes can be used to enforce uniqueness of combinations of fields.

Oftentimes, we don’t require values in fields to be unique, but we require combinations of fields to be unique. For example, suppose an addresses table has a street field, address_number field, and city field. We don’t require its street or house_number or city to be unique, since multiple addresses can share the same street , house_number or city . However, we probably want to require every street-house_number-city combination to be unique, since that pretty much defines an address. We can use a composite index for this with a UNIQUE modifier.

CREATE UNIQUE INDEX index_st_no_city 
ON addresses (street, house_number, city);

Feel free to leave comments, questions, suggestions, corrections below. — S