Searching algorithms

How to get a better performance when working with collections.

Maciej Najbar
4 min readNov 8, 2020

Software is meant to help people computing mass of calculations. It’s safe to say, that each software uses collections. How much time or resources can we safe by only making our decisions consciously?

Different use cases — different algorithms

First we need to be aware of the use case. There’s no such thing as a one-fits-all solution. We have to pick the right algorithm for current situation and there might be a couple of them. From the top of my head:

  • Searching on unsorted collection
  • Searching on a sorted collection
  • Searching once in an collection
  • Searching multiple times on the same collection
  • Searching for the first occurrence
  • Searching for the last occurrence
  • Searching on big collections
  • Searching on infinite collections (streams)

If you can find more, leave a comment and help me improve this article.

Linear search

Linear search is the most common searching algorithm. Its biggest advantage is that it’s ready-to-use — it just always works, no extra prerequisites. It comes with a cost of time though. Let me explain.

Linear search doesn’t care if the collection is sorted or not, it just goes one-by-one element and checks if it matches our key (key is an element I’m looking for). The worst case scenario for this algorithm to find a key is O(n) where n is a number of elements. It will happen in two cases:
1. Key element is the last one
2. Key element is not present

Linear search

Would sorting a collection right before searching help? Not at all. Most sorting algorithms’ time complexity is O(nlogn) where nlogn > n.

What if we pass already sorted collection, would that help? Depends on what it is sorted by. Let’s say we have a list of items in the store and our collection is sorted by id but we search by name — doesn’t help. Nevertheless Linear search algorithm will find our key.

Linear search

Binary search

What if we have a collection sorted by key and search by key as well? There were people who asked this question long before I did. They figured out a way to decrease time complexity to just O(logn) where logn < n.

Binary search uses advantage of sorted array. It lets us check only most strategic keys to get to one we’re looking for. It’s named Binary, because it divides the range by 2 on every step.

Binary search

Bear in mind that it required only 4 steps to find a key 7 in an array where Linear search would require 7 steps for the same collection. Also it required less steps to find out that a key is not present in a collection.

Think about a warehouse application. A customer comes to the store and asks if you have brown shiny shoes in the store, then asks if you can find a brown belt and in the end he asks for a watch with a brown stripe. If you have products sorted by name you can simply find “shoes”, “belt”, “watch” etc. in just a bit of time.

Exponential search

Binary search is a powerful tool. There are some improvements to its algorithm to make it even better e.g. Exponential search. The name doesn’t come from its complexity but from the idea behind this method. Since it uses Binary search under the hood it’s complexity is O(logm), where m is a size of range we set.

If it uses Binary search why would we even care to create an Exponential search method? Think BIG collections! So BIG that even infinite — like streams. How to search on a collection that never ends? We need to find a range first. Exponential search does exactly that, it searches for a range with power of 2.

Exponential search

After we specify ranges it takes much less work to find an exact value. Of course exponential ranges can get very big. If that’s the case, I recommend SQL operations which are optimised for such situations.

Exponential search

Jump search

All above algorithms search for a matching element, but in programming it’s not the only case. Sometimes we have multiple elements matching the same key and we mean to find the first one.

Jump search works only on sorted collections. Its complexity depends on the step. It’s been proven that the most optimal block size is √n which makes its complexity O(√n).

Jump search

This algorithm is a combination of divide and conquer with Linear search. When we finally find a correct range, we get the first element that matches predicate linearly (I’ll leave you do a method to find the last occurrence).

If you want to find the first occurrence in unsorted list, you want to use Linear search through entire collection.

Every language has defined the most common searching algorithms like Linear search and Binary search (they don’t have to name exactly that).

Since the first one doesn’t have any prerequisites it is the most commonly used in projects around the world. Conscious usage can result with better performance though.

All the algorithms from above can be improved e.g. we can reduce the number of ifs. Nonetheless they are already enough to speed up most of our applications.

--

--