Why Do We Add Indices to Foreign Keys?
When designing a table that has a column referencing a primary key of another table, or in other words, a table that has a foreign key to the primary key of another table, it is pretty common to automatically add an index for that column.
It was my understanding that the main reason we want to do this is because of the hypothesis that we usually join child tables (the table referencing) with parent tables (the table being referenced), and therefore having an index in that foreign key will make that join queries more performant.
A query such as:
select employees.id, company.name
INNER JOIN employees ON employees.company_id = company.id
will be more performant if we have an index in the employees.company_id foreign key.
There is however another use case that even though it is not as widespread as the one already explained, is important and something folks working with relational databases should be aware of. In order to explain this other use case, I will use an example:
Say we have a table that stores employees and a table storing companies. An employee belongs to a company, and so it has a foreign key to the companies table. Now, let’s say that we have a client (client A) that wants to delete a company with id 10. Before the clients delete that row, the DBMS needs to check that there are no employees that are referencing a company with id 10, because if that were to be the case, we would be leaving the database in an inconsistent state and a user row orphan. Finding all the employees where company_id is 10 will be more efficient if we have an index in the company_id foreign key. If we don’t, we will be forced to do a full table scan, which will take a long time if the table is big.
Now, things get even worst in this other scenario. Let’s assume that now we have another client (client B) that wants to insert a new employee for a company with id 20 (that company didn’t have any employees yet) at the same time that client A wants to delete the company with id 20.
- Wants to delete company with id 20
- Must check that company with 20 exists
- Must search for employees that might be referencing the company with id 20
- Wants to insert a new employee referencing company with id 20
- Must check that company with id 20 exists
For client B, things are easy, because finding a company with id 10 is fast, due to the fact that it just needs to check the primary key, which is indexed by default. Client A though will have a harder time because it has to go through all the employees that might be referencing the company with id 10, to ensure it won’t leave any row orphan. Going through all those employee rows, if there is no index in the company_id foreign key, will require a full table scan, which can be slow as we have mentioned before.
But things get trickier. We can get in the situation where Client B commits the insert of this new user with company_id 20 at the exact point that Client A just concluded that it hasn’t found any employees with company_id 20, and as a result, it thinks that it can delete the company with id 20. So, in this case, we would have a user referencing a company with id 20, which does no longer exists because it was just deleted.
Ideally, client A would have a way to say: “Hey database, while I am trying to delete the company with id 20, don’t allow any other user to add an employee that references company with id 20”. And DBMS actually allows us to do this through a technique called locking. Using a lock, client A will be able to say: “the employees' table is locked from anyone to insert/update/delete rows until I have finished deleting the company with id 20”.
This is great news, right? But there is a problem here. If the process of deleting the company with id 20 takes a long time (due to not having an index in that foreign key), we will be blocking a lot of operations from happening in the employees' table, a problem that is technically known as Long Term Blocking. So, how can we speed this up? Adding an index to the company_id foreign key in employees. That way, in order to find whether there are any employees that have a reference to the company with id 20, we can use an index without having to do a full table scan.
To conclude I will say that by knowing why do we add indices to foreign keys, one can also reason when we might consider not adding indices to foreign keys. At the end of the day, indices are costly to manage, and if they don’t provide us any benefits, one should at least consider not to use them. Consider the scenario where table A which just has a couple of columns and not many rows (and won’t grow over time), has a foreign key to table B. Moreover, table A doesn’t have a lot of operations coming through. If we were in that situation, doing a full table scan to find the rows in table A that refer to table B, wouldn’t be costly (because of a small number of rows), and also since there is not a lot of operations happening in the table, once we had to lock it, we wouldn’t be blocking a lot of operations from happening.