[ SQL performance Killers ] — IN vs EXISTS

Matheus Oliveira
NAZAR
Published in
3 min readJul 13, 2018

--

An important part of my daily work is to help developers, database administrators, devops, sysadmins and project managers to identify (using Nazar.io) and fix bad SQL code. And the idea with the [SQL performance Killers] series is to share some practical sql tuning examples.

Besides applying functions to column data, the wrong use of the IN and EXISTS operators is another very common “SQL performance killer”. Although, in many cases, using either one or the other will give the same query result, they are processed very differently and knowing this difference is the key to avoid killing query performance.

The IN operator is typically processed as:

1 — The subquery is executed and result is distinct’ed;
3 — Then joined to the external table;

IN operator i.e:

Will typically be processed as:

While the EXISTS operator is typically processed as:

1 — For each row of the external table [ Full table scan ] the subquery will be executed once;

EXISTS operator i.e:

Will typically be processed as:

So, when using IN is better than EXISTS, and vice versa?

When the subquery [ SELECT column_c FROM table_y ] is slow and the external table [ table_x ] is relatively small, then using EXISTS will be faster, as the time to full scan the external table and execute the subquery will be less than the time to execute the subquery and to distinct on and join with the external table.

When the subquery [ SELECT column_c FROM table_y ] is fast and the external table [ table_x ] has many rows, then IN is better, as the time to execute the subquery, distinct on and join with the external table will be less than the time to full scan the external table and execute the subquery.

If both the subquery and the outer table are small, IN and EXISTS will have the same performance.

If both the subquery and the outer table are big, either might work as well as the other, it will depend basically on the existing indexes.

“It is not lack of hardware,
It is not network traffic,
It is not slow front ends,
the main performance problem in the huge majority of database applications is bad SQL code.”
Joe Celko

--

--