Linear Search vs Binary Search and Why We Should Approach any Problem Using Binary Search As a Developer
Let’s say we are given a sorted array of n elements and asked to search for a given element in the array.
A simple and straight forward method is to use linear search, which we start from the first element, compare one by one, and find the element we want. To demonstrate, say we have a sorted array of 10 different numbers from small to large and we want to search for 70:
In worst case, it requires us to run through the whole array to find it, that is, when the element is at the last index. If n increases by m, the comparisons/guesses the algorithm needs increases by m as well. So we can say linear search has a worst case run-time complexity of O(n) (more on this below).
What about we use binary search? And what is a binary search?
From Wikipedia, “Binary search compares the target value to the middle element of the array; if they are unequal, the half in which the target cannot lie is eliminated and the search continues on the remaining half until the target value is found. If the search ends with the remaining half being empty, the target is not in the array”. Let’s draw it out to demonstrate:
Say we have the same array and we want to search for 70:
At first step, we find the midpoint, and divide it in half, so visually we have:
We search in the first half, 70 is not in them. So let’s ignore it, and for the second half, we find the midpoint and divide it in half again. And since it has odd number of elements, let’s also include the median number (we can also not include the median number, it won’t make a real difference since the other half will include it then), so now we have:
For our new first half, let’s search for 70, it is still not in them. So let’s again ignore it, and find the midpoint and divide it in half on the second half. Now we have:
And this time, 70 is in the new first half. We have completed a binary search.
As we can see, binary search is much more efficient than linear search, since every time we only need to search through half of the remaining array. In fact, binary search only has a worst case run-time complexity of O(log n) (log by convention is base 2). And let’s compare how different in performance O(n) vs. O(log n) can be — say we have a sorted array of 1,000,000 elements:
At worst, linear search needs a run-time of 1000000 guesses to find the target element.
At worst, binary search needs a run-time of log(1000000) = 19.93, approximately 20 guesses to find the target element.
That’s the difference!
Now you may ask, why does binary search has a worst case run-time complexity of O(log n)?
To recap, according to Wikipedia: in computer science, big O notation is used to classify algorithms according to how their running time or space requirements grow as the input size grows. It characterizes functions according to their growth rates.
Thus for linear search, it is straight forward that as the size of the array grows, the worst-case run-time complexity is O(n).
What about binary search?
For simplicity reasons, let’s assume we are given an array of 8 number of elements and the one we need to search “happens to” be at the last index (worst case situation). So each time, we divide our array by half until we have only one element left, that is:
If we refactor:
For n elements, let’s refactor:
Let’s put it in logarithmic form since we need k:
And that’s why the worst-case run-time complexity for binary search is O(log n). The most amazing thing about binary search is that, every time the size of the array doubles, the number of run time guesses only increases by 1 only!
Now think about how we normally approach a problem…
By intuition, we usually use linear search, right? But now we know binary search can be so efficient, especially for a big problem…
So, as a developer, how do we debug?
To conclude, I want to quote Dan Abramov:
Use binary search and scientific method. Not literally in the code, but in how you approach it. Have a bug somewhere between callsites A and B? Put a log in the middle. It’s between A and the midpoint? Put another log in the middle. Something’s wrong with some input? Eliminate half the input. It’s working? Try the other half. Etc. Debugging can feel very arbitrary but it’s straightforward when you do it mechanically. Observe, form a hypothesis, come up with a way to test it, repeat. And cut things in half when there are too many.