Strict Weak Ordering and the C++ STL
We are told in algorithms class that the majority of programming tasks can be reduced to sorting and/or searching. As such there are many very general and highly efficient algorithms dedicated to both these tasks implemented in the C++ STL. As a part of the generality of these algorithms, they can be applied to any data type as long as you define a comparator for them.
The comparator is a function for comparing two elements and determining their relationship to each other. For numbers it’s clear what
x < y means, for characters we may adopt alphabetical order and for words you can use lexicographical order. Once you have a comparator declared, you gain access to a wealth of powerful algorithms in the C++ STL for efficient operations on a vector of some arbitrary data type. You can use it for dates, students, zoo animals or books as long as you can tell your algorithm which one should come before the other when two elements are compared.
Now there is one very immediate issue that comes to mind, what if your data has a circular relationship? For example in rock-paper-scissors you have
scissors > paper,
rock > scissors,
paper > rock. In this awkward case it’s impossible to properly sort a vector of such elements, meaning that it’s impossible to guarantee that for any member of the vector, that element is always greater than any element to its left and lesser than any element to its right. This is an undefined behaviour in C++, there are no rules for how this is supposed to be handled, the compiler cannot work out that your comparator is valid. Your program may or may not crash, and it may or may not produce mildly sensible looking results that fool you at a glance (for example putting all the scissors first, then rocks, then papers).
Strict Weak Ordering
Having a circular relationship is called non-transitivity for the
< operator. It’s not too hard to realise that if your relationships are circular then you won’t be getting reasonable results. In fact there is a very strict set of rules that a data type and its comparators must abide by in order to get correct results from C++ STL algorithms, that is strict weak ordering. In general we need only define the
< operator and we will get
== for free because:
a > bis equivalent to
b < a
a == bis equivalent to
!(a < b) && !(b < a)
Then for strict weak ordering we must have
- For all
x < xis never true, everything should be equal to itself
x < ythen
y < xcannot be true
x < yand
y < zthen
x < z, the ordering should be transitive
x == yand
y == zthen
x == z, equality should be transitive
Example of Failure
It’s difficult to imagine how one might violate the first condition and second of strict weak ordering accidentally. The third condition is violated in our rock-paper scissors example and it’s reasonably easy to understand how such a violation prevents proper sorting. But the fourth condition is subtle and is the pitfall that caught me.
We will demonstrate the problem using intervals, each
interval has two properties
end. We say that:
interval1 < interval2 if
interval1.end < interval2.start
That is to say that for two disjoint intervals, the “lesser” one ends before the “greater” one starts. For example we have
[1, 10] < [14, 25] and
[14, 25] > [10, 13]. This on the surface appears to give us a really nice property, that two intervals are considered equal if they are overlapping. For example
[10, 15] == [14, 25]. One of the most common operations on intervals is to find all overlapping intervals, one might think “Great! Once I’ve sorted I can use
std::equal_range to get all overlapping intervals.”
Had you violated any of the first 3 conditions, you might quickly encounter garbage results that alert you to an issue, but violating the fourth has very subtle consequences. The fourth condition has been violated by our intervals because
[1, 5] == [2, 10] and
[4, 10] == [8, 15] but
[1, 5] != [8, 15]
This means that the results of the STL operations will be undefined, in particular let’s say we had a set of intervals ordered as follows:
[[4, 10], [1, 5], [8, 15], [20, 30]]
This is a correct ordering based on our rules, if you do the pairwise comparisons between any two elements you can see they are in fact in a permissible order.
So we technically have a sorted array, which means we should be able to leverage fast algorithms that require sorted arrays, primarily binary search. With our definition of inequality, we have the convenient fact that equality holds when two intervals overlap, so it is tempting to think that we could use binary search to quickly find overlapping intervals.
Consider the interval
[6, 7], we start in the middle and check
[1, 5], we find that
[6, 7] > [1, 5] and so we move onto the right side of the array to continue our search, and clearly neither
[8, 15] nor
[20, 30] overlaps our interval. But actually
[4, 10] overlaps
[6, 7] and was never considered due to the nature of binary search. THIS is the danger we face when we do not provide a comparator that satisfies strict weak ordering.
It is a most insidious bug, because many results may appear correct, it was only by a very specific arrangement that I was able to produce an incorrect result. Such a problem would be difficult to discover and subsequently debug; we are generally interested in fast algorithms when dealing with large data, where it’s infeasible to check by hand the full data set. But the act of creating a smaller test set changes the path taken by the algorithm and can suddenly yield the correct result.
The take-away is to always pay attention to requirements if they exist and in general be mindful of how you apply STL algorithms such you do not find yourself in the land of undefined behaviour.