Why disjunctions in database queries are bad?

ashish goyal
3 min readSep 19, 2020

--

Disjunctions refers to the use of “OR” conditions in a SQL query. Before going into what’s wrong with disjunction, let’s see how indexed search works. If you already know about it, you can skip to the section explaining the problem with disjunctions.

Indexed search

Have you tried searching for a word in a paper dictionary. There are 3 ways of doing this -

  • Randomly open a page. Check if the words is there. If found stop, else repeat.

If you are lucky enough, you can find the word in the very first try. If you are not so lucky, you may spend the rest of your life looking for the word.

  • Start from the very first word in the dictionary. Keep checking the next word until the desired word is found.

The amount of time spent searching the word depends upon the position of the word in the dictionary. It is going to be the same for the same word every time you look up.

  • Check the middle word of the dictionary. If it is the same as searched word, stop. Otherwise, compare the middle word with your searched word. If your word should appear before the middle word, keep looking in the first half of the dictionary, otherwise keep looking in the second half of the dictionary.

Most modern systems use the third approach for searching. It is fastest. And the time spent for searching any words is almost the same. However, for this approach to work, all the words should be organised in an alphabetical order.

Organisation of data in any order is called “Indexing”

Limitations of Indexed search

In the above example, the words in the dictionary are used for searching the meaning. Thus, the words would be collectively called the “Index”. Meanings of these words is the data that we are searching for.

As you would already guess, we can only lookup in the dictionary using the word. We can’t lookup the dictionary using the meaning of the word. For that we would have to use a Thesaurus. Essentially, we can’t simultaneously lookup for a word and its meaning in a single Data source (dictionary or thesaurus).

Problem with disjunctions

Query 1 — Disjunction on one column of a table

select * from users where id in (1,2,3,4);

This is like searching multiple words in a dictionary. Each of them can be found in almost a constant amount of time. This query is completely fine. Go ahead and use them as and when you want.

Query 2 — Disjunctions on multiple columns of a table

select * from users where name = “user1” or age between 20 and 30;

This query is like searching for the word “OR” its meaning. You may have both the dictionary and thesaurus handy (equivalent to having indexes on user.name and user.age). You will have to search the dictionary for the word, find relevant results. Then search the thesaurus by the meaning. Combine the results of both searches. Until this step it’s fine. However, if we increase the number of words and meanings to search for, we would end up looking up large chunks of the dictionary and thesaurus each.

select * from users where name in (“user1”,”user2”, “user3”) or age between 0 and 30;

If we increase the number of indexes in disjunction. The number of chunks to be scanned increases. Ultimately resulting in almost the whole table to be scanned. Usually RDMS systems, scan the whole table for results when querying on multiple columns with disjunctions.

Query 3 — Disjunctions on multiple columns of multiple tables

select * from users inner join address on users.addressId = address.id where users.name in (“user1”,”user2”, “user3”) and address.country in (“India”, “China”);

This query also has the same complexity as query 2. It results in a search of the complete database tables.

Conclusion

Any queries, that involve scanning of the complete database tables are considered bad. With the indexed search we are able to achieve search complexity of O(log n). However, if the whole tables are scanned, the search complexity becomes O(n). In such cases, we should identify the orthogonality between columns and optimise our queries accordingly.

--

--

ashish goyal

The key to solving complex problems is - breaking them down into smaller ones.