Between linear and binary search

Implementation of a biased binary search

C++, algorithms, performance.

Note: I was not able to google the algorithm that I describe, so for now I will take the credit for it. If you have seen this algorithm or a similar one please let me know.

Some algorithms operating on sorted ranges are typically implemented on top of linear search instead of a binary since the result is expected to be found close to the beginning of the range (std::merge, std::unique would be examples from C++ standard library). This works because linear search has O(distance(f, result)) complexity while binary search O(log(f, l)). Which means that if the result you are looking for is sufficiently close to the beginning of the range, linear search would find it faster. This article talks about an alternative to this approach. Let’s see how linear search vs binary search looks on the benchmarks.

About the benchmarks:
I use a sorted std::vector<std::int64_t> with 1000 elements, since this feels close to what I have encountered in practise. All the elements are unique in order to avoid accidental misleading results. On x-axis we have the distance between an element we are looking for and the beginning of the range. On y-axis we have time in nanoseconds.
(Info: source, benchmarking library, machine, compiler — clang-902.0.39.2 (apple versions))

As one would expect, binary search has consistent times on the whole range while the linear one grows with increasing position of the element we are looking for. Closer to the right it’s about 14 times slower for this example.

However, if we look at first 40 elements, linear search is way faster (10–15 times in the beginning). And the worst part is that this gap depends on the size of the whole range, meaning — if we have more elements in the range, this gap will only get worse.

Let’s see what we can do about it.

Idea #1: Jump in powers of 2 (Exponential or Galloping search).

Note: thanks to Vittorio for pointing out the original authors of this algorithm. Wikipedia.
Binary search at each step looks in the middle of the range. If the probability of finding an element is greater to the beginning of the range we want to start searching closer to the beginning. The idea of a Galloping search is to jump in powers of 2 in order to quickly find the right boundary. 2 is a good number for computers and also multiplying by 2 is just one add instruction but there are alternatives.

Note: the most basic form of binary search in C++ is called partition_point. It accepts a range and a predicate. Range must have a point in it, before which for each element the predicate is true, after — false. Algorithm returns this a point. My variation is called partition_point_biased since it expects to find elements closer to the beginning of the range.

Walking through code: we are trying to find a point, where predicate is false. On each iteration we jump by the step and check. If the predicate is false, it means that the partition_point is somewhere between our previous test and this one — we can fallback to the regular binary search. If the test is still before the partition point, we can advance the left border to the next element after the test (explanation from geeksforgeeks).

Note: we can do some nice tweaking (unrolling a first iteration, do the check for the first element manually, etc) but those things are better to discuss later.

Complexity:
We have two loops. First — we get to overreach the result. At the step k we’ll be at:

2⁰ + 1 + 2¹ +1 +…+ 2^k +1 =(geometric progression)=
2^(k + 1) - 1 + k

Which means that we will overreach distance(f, result) at 
O(log(distance(f, result))) steps.
Note:
our last jump was 2^k — slightly less than half of the whole distance we went. 
Then we do binary search over the last jump => (Note) => + another O(log(distance(f, result)) / 2).
Therefore the whole algorithm is O(log(distance(f, result)) with a bigger constant than a regular binary search. Let’s see the numbers.

I removed the linear search from the whole range plot since it messes up the y-scale and the algorithms get difficult to distinguish. This way we can see that at worst we are about 1.5 times worse than binary search — which is OK since this is on the right hand side. However, everything before 130 elements — we win.

If we look at the first 40 elements — we see that our variation performs much more favourably over the plain binary search (about 2 times loss over binary’s 10 for the very first elements and then we are compatible with the linear one).
What is also much nicer about this algorithm is that it’s complexity does not depend on the size of the range, only on the distance to the result, meaning that this difference between linear and biased search does not get worse with the increased size of the range.

Idea #2. Use sentinels.

When we do both linear search and the biased search, we do a lot of boundary checking. We can reduce the number of these checks by checking somewhere closer to the end. Let’s look at how this would work for linear search:

Since we know that there is at least one element that is !p, we know that looking for the first one from the beginning will succeed.

Note: for linear search, in this case, this is slower on both clang and gcc (quickbench). The difference especially seen in libstd++, since they unrolled their find_if implementation. libc++ didn’t, but it still wins purely on better compiler optimisations. I’d note that there are cases where people successfully use it, see this awesome Alexandrescu talk.

Coming back to the biased binary search. In order to find out where the sentinel could be we should go back to how we jump. From the note we know that at least from some K our next jump is always slightly less than what we already covered. Which means, that if we are to the left of the middle(f, l), we are never going to jump over l in one go.

Let’s see how it plays out on an example. Let’s say we are looking for 100, and the middle of the vector is 1000.

Let’s say at a certain point we are at the position with value 1025. We are not going to jump forward, because we are bigger than 100.
If we are at 93 — we are less than 100 — we will jump forward. Since we are to the left of the 1000 — we are not going to jump further than distance from the beginning to the 1000, which is half the array => we are safe. If we overreach 1000 — we are OK, since we are not going to jump any further.

Which means that if it’s !p(middle(f, l)) we can run without checking for the boundary. This is not true for the very small distances, since skipping the element we just tested comes into play. However, we can work around it by having a fixed size linear search on each jump. (We have some sentinel — the middle — and any sentinel works for linear search).
It turns out — unrolling the first power of 2 and doing 1 extra iteration is sufficient — proved by testing all scenarios.
Here is the code:

Note: when implementing the middle function, it’s better to divide by two as an unsigned number, since it produces smaller code, doesn’t occupy registers with constants and is measurably faster in some cases. godbolt

Having said all that, let’s see the measurements.

So, comparing with the standard binary search for integers, we are always better. Note: we do call the predicate more times than regular binary search, therefore this result should be remeasured for the types you use.

For the first elements, we are starting to consistently beat linear search after the 5th element, and even before that we are very close. And it’s almost completely independent from the range size (we might encounter a cache miss, while looking in the middle if the range is big enough).

Should we replace all of the usages of binary search with this algorithm?

If you have integers, and it matters enough to bring this algorithm — go for it (please let me know). If you have something that might be more expensive to compare — do measure on the real data first, this algorithm does more comparisons invocations. Here is the plot: x — element we are looking for, y — how much did we call the comparison.

So yes, we do invoke the predicate up to 4 times more. Measure, might be worth it (or if you always find the element close to the beginning). We could also modify the algorithm to invoke the predicate, only if we are less than the right border, but it has to be measured.

Should we always use this instead of a linear search in sorted ranges?

I am pretty sure that if we unroll a few iterations of the linear search before going into the main loop, we can construct an algorithm that would be sufficiently better for all reasonable use cases than linear search. However, there are two other issues:
1. It’s hard to build efficient algorithms that searches in multiple ranges with our algorithm. This topic is too big to cover here.
2. There is an overhead from using any searching algorithm over the linear one.

auto found = best_search_ever(f, l, p);
for (I i = f; i != found; ++i) { /* do smth */ }

Whenever you use a searching algorithm, it’s going to require you to write a loop. Loop implies boundary checks etc, and you don’t have to do those for linear search, you can just do what you need as you check.
My suggestion would be — for really demanding algorithms, use a biased search as a fallback, like this:

  for (int i = 0; i < 3; ++i) {
if (f == l) ...
if (!p(*f)) ...
do_stuff(*f); ++f;
}
  // 3 elements in a row were p. Makes sense to go for it.
I found = partition_point_biased(f, l, p);
std::for_each(f, found, do_stuff);
...

I have a real case where this is a good strategy.

Forward ranges.

Standard implementation of binary search works for forward ranges (linked lists etc). The idea here that you would make more steps, but if the predicate is really expensive, it might be worth it.
The second variation of the search doesn’t really make sense in this case since it trades off less boundary checks for more predicate invocations. However the galloping search can do this reasonably well. (The code is not terribly interesting but can be found here).

First we should look at predicate invocation count. For the majority of elements we (obviously) do more comparisons. But in the beginning we do noticeably less, and this is what we probably want our biased variation to do.

What’s nice about this algorithm is that it has significantly less overhead compared to linear search (in terms of everything other than comparisons).

This benchmark is just searching in a std::forward_list<int>. As we can see — at most two times the overhead which is much better than binary search. I chose this variation as an implementation of partition_point_biased for forward ranges.
Note: the interface to the binary search algorithms from the standard library takes two iterators. However, binary search requires finding the middle — which means that it has to compute the length. For forward range this can take a noticeable amount of time. If we know the length upfront, we can be faster. Alexander Stepanov at some point talked about partition_point_n. I didn’t find his implementation but here is a variation if you need it. Ideally we would probably like it to also return the distance to the end but it’s under TODO.

Algorithms on top.

There are some algorithms that can be built on top of partition_point_biased.

  • lower_bound_biased , upper_bound_biased, equal_range_biased — variations on the binary search algorithms from the standard library.
  • partition_point_hinted, lower_bound_hinted, upper_bound_hinted, equal_range_hinted — variations on the binary search algorithms from the standard library that are biased to certain element in the range, not the beginning (require bidirectional iterators).
  • group_equals — a variation on the range-v3 group by view that relies on the underlying range being sorted. Enables getting equivalent elements from a sorted range. sort + group_equals is an efficient way to group things.
    Note: in my experimental implementation, resulting control objects are not terribly regular and the general API could be nicer.

Applications.

  • Building variations on set algorithms from the standard library that perform well on ranges that have drastically different sizes. This was the original motivation behind this research and I have some success but I need to do more before publishing (I hope to get around to that someday). You can find a draft of one of the possible algorithms here.
  • A map of sets. Reasonably often one needs a map of sets. If you have some flat_set/flat_map implementation (or are OK with using sorted vectors), you can sort a vector of pairs and then use a group_equals, comparing just the keys (possible implementation). I used a variation on this in real code quite a bit.
    Note: provided implementation uses emplace_hint with correct hints which has many checks both for capacity and hint correctness. We can get rid of them partially or fully but the code would be more complex.
  • Group dots on a plane by a coordinate (process dots by lines). Here are a couple of examples from code_report where this is helpful: https://youtu.be/czzx3pVAdw4, https://youtu.be/CdEWYcoCOwA
  • A variation on a popular interview problem 2-sum. If the problem is done on a sorted range than looking up from the beginning and end is a good candidate for a biased search.

Afterword.

The algorithm has not been used in production.

I tried to put everything under the MIT license — if you need it to be something else — feel free to reach out to me.

I did extensive testing including fuzzing the main algorithm. However it is entirely possible that I missed something.

If you have a popular open-source library you think this would be useful in, I’d be happy to contribute.

A lot of work has gone in different implementations of binary search (examples: link1, link2). However they are focused on being faster everywhere, while for me it was important to focus on the left side of the range. I don’t have measurements against them. Would be an interesting comparison though.

Acknowledgments.

Huge thanks to Conor Hoekstra and Vittorio Romeo for the reviews.

Contacts.

Source.
e-mail: denis.yaroshevskij@gmail.com