Non-Ownership and Generic Programming and Regular types, oh my!

Barry Revzin
9 min readMay 10, 2018

--

This post is about a specific collection of types in the C++ core language and standard library. I am not sure of a good way to name this collection, and some terms that come to mind come with their own baggage, so I’m going to for now group them together under an umbrella that is clearly widely unrelated to programming and call them Westie types (because, like my dog, they are awesome yet enigmatic).

The types I’m talking about are those that have no ownership semantics whatsoever over their underlying data (but also are inclusive of reference types despite that whole lifetime extension loophole). That is, these are types are trivially destructible (they don’t own anything, so nothing to destroy), trivially copy constructible (since they are, under the hood, just a pointer or two), and can dangle (since some other object is responsible for ownership and their lifetimes are not tied together). The Westie types include:

  • T* (regardless of whether or not a raw pointer is semantically owning, it certainly doesn’t own anything in of itself — you’d have to explicitly delete it or whatever equivalent).
  • T&
  • std::reference_wrapper<T> (since C++11)
  • std::tuple<T&...> (since C++11. a.k.a., the result of std::tie() and std::forward_as_tuple())
  • std::string_view (since C++17)
  • std::span<T> (in the working draft for C++20)
  • std::function_ref<Sig> (P0792 currently in LEWG)
  • std::observer_ptr<T> (Library Fundamentals v2 TS)
  • std::optional<T&> (doesn’t actually exist in the standard library, I’m not aware of any proposals for it to exist, but hypothetically it could… one day)

The Westie types are all quite different. But I thought it’d be interesting nevertheless to take a look at them all together and compare and contrast them with regards to various type traits important to generic programming:

  • Default Constructible
  • Copy Assignable (and whether the copy is “shallow” or “deep”)
  • Equality Comparable (and whether the comparison is “shallow” or “deep”)

Since these types are non-owning, operations can either be performed directly on the type itself (“shallow”) or through the type on the data that the type points/refers to (“deep”). For instance, pointers have shallow copy assignment and equality comparison (we just assign or compare the value of the pointer, which is some address) whereas references have deep copy assignment and equality comparison (since references behave as just aliases for the underlying object, we assign and compare through the reference). Shallow copy assignment is also commonly referred to as rebinding.

For simplicity, I am going to assume that the underlying types are all copyable and comparable, to avoid having to add caveats.

Altogether, we have this table, which is ordered in a way that will become apparent by the end of this post:

Two notes here. First, the reason that std::optional<T&> doesn’t exist is argument over what it’s copy assignment semantics should be. There is no definitive answer here, but I like shallow here myself.

Second,std::reference_wrapper<T> does not directly provide any comparison operators. However, it does provide operator T&(), which means that it ends up being comparable for some Ts (e.g. int) but not for others (e.g. std::string). Where it is comparable, it’s a deep comparison.

Alright, so what do we make of this table? Before we get into that, let’s digress into generic programming fundamentals and its most cherished concept: Regular. A type is Regular if it is:

  • Default constructible
  • Copyable / Moveable / Swappable
  • Equality Comparable (in Fundamentals of Generic Programming, Regular requires a Total Order but the Ranges TS just uses Equality Comparable — I am sticking with the Ranges TS definition)

Now, it’s insufficient to leave it at that. It’s not enough that a type meets the syntactic constraints that these terms imply (which are the only constraints that we can check, really). They also also come with semantic constraints. So it’s not enough for Equality Comparable that x == y compile and yield bool. This operation must also be an equivalence relation. On top of that, it must also be what the Ranges TS calls equality preserving which has the important property that:

Expressions required by this specification to be equality preserving are further required to be stable: two evaluations of such an expression with the same input objects must have equal outputs absent any explicit intervening modification of those input objects. [ Note: This requirement allows generic code to reason about the current values of objects based on knowledge of the prior values as observed via equality preserving expressions. It effectively forbids spontaneous changes to an object, changes to an object from another thread of execution, changes to an object as side effects of non-modifying expressions, and changes to an object as side effects of modifying a distinct object if those changes could be observable to a library function via an equality preserving expression that is required to be valid for that object. — end note ]

Fundamentals of Generic Programming describes the equivalence between copy construction, assignment, and equality as satisfying four axioms:

  1. T a = b; assert(a == b);
  2. T a; a = b; is equivalent to T a = b;
  3. T a = c; T b = c; a = d; assert(b == c);
  4. T a = c; T b = c; zap(a); assert(b == c && a != b);

The first states the connection between copy construction and equality. The second states that default construction and assignment is equivalent to copy construction. The third states that modifying an irrelevant element can’t change the constructed relationship between two others (we copy constructed a and b from c and then changed a, this shouldn’t affect the relationship between b and c). The fourth involves a function zap which changes the value of its operand — changing the value should affect its equality but not other, unrelatedobjects’ equality).

I italicized value in the last sentence because it’s an important concept: copy construction, copy assignment, and equality all are based on the value of the type. It’s important for the value to actually mean the same thing in all three cases.

The canonical example of a type that is Regular is int (hence the occasional saying, “do as the ints do”). It should be pretty straightforward to work through the axioms and convince yourself that they’re all satisfied. The other canonical example are pointers, T*. The value of a pointer is its address, all of the operations are based on this address, so we get Regular basically directly from ints being regular.

One example of a type that is not Regular, but just about meets all the requirements, is std::string_view. This is fundamentally because it has different notions of value for its copy construction and copy assignment (i.e. the address of a character and a size) than it does for its equality operator (i.e. the value of that buffer). And hence, we can construct an example that violates axiom (4):

Here we change the value of a to point to an entirely different string. We did, in a real sense, change the value of a. But while b == c still holds (indeed, there is nothing we could do to change that), a != b is false. string_view comparison is deep, and the underlying ranges have the same contents.

With span<int>, we could violate axiom (4) in two different ways. We could either change a to point to a different, equivalent buffer as above (that is, changing its “direct” value), or we could change a[0] to have a different value (that is, changing its “underlying” value, since this is a non-const view). Either way, we end up with a != b failing since either we’re now pointing at two equivalent ranges or we ended up modifying both with the single modification. That is, even though span<int>s can be compared with ==, and even in a way that’s an equivalence relation, it’s still not enough to be considered Equality Comparable.

Another way of looking at this is the “distinct object” test. When I copy an object, do I end up with two distinct objects such that there is nothing I can do to one that changes the value of the other? If not, the type is not Regular.

Of our original nine candidates, 4 aren’t Regular because they aren’t default constructible. An additional 3 aren’t Regular because of mismatches how they are constructed and how they are compared. Only 2 are Regular: T* and its really awkwardly spelled cousin, std::observer_ptr<T>.

Ok so, string_view and span<T> and optional<T&> aren’t Regular. So what? Why does this matter?

The case against Regular is largely a function of field experience. string_view and span<T> are predominantly used as light-weight, non-owning, cheap-to-copy-and-pass-around replacements of containers or buffers (std::string in the first case, and any number of different contiguous containers in the second), and in these contexts the comparisons we would expect to use are almost always (if not really always always) the deep comparisons. After all, it would be pretty odd, not to mention error prone, if "hello"s == "hello"sv compiled and evaluated as false and sort()ing a container of string_views gave you some more-or-less random order rather than an alphabetical one.

The case for Regular is that these semantics are fundamentally confused. The equality above is ambiguous — do we convert the string_view to a string and compare the contents, or do we convert the string to a string_view and compare the pointer/length? This comparison should simply be completely disallowed. If you want to compare the underlying contents, you should write ranges::equal("hello"s, "hello"sv) which is completely unambiguous.

Allowing these mismatched semantics leads to generic code which can be axiomatically proven correct yet still break. Consider the following algorithm, courtesy of Tim Song:

This is a properly constrained algorithm — it has all the requirements that we need, and the algorithm itself is also correct based on those requirements. Given a range of ints or T*s or std::strings, it’ll do the right thing.

However, consider what happens in this scenario:

It’s actually quite interesting and subtle. spans will still contain two elements, and spans[0] <= spans[1] does hold. However, while we started with two spans to two different vectors… we end up with two spans to the same vector. That is, assert(spans[0].data() != spans[1].data()) fails! Why? The ultimate reason is: because span isn’t EqualityComparable per the complete definition described above, and so our axiomatic proof fails. Garbage in, garbage out (or, in Ranges TS terms, ill-formed, no diganostic required).

Of the default constructible Westie types, two are Regular and three are not. What can we do with the irregulars? We have a few options:

  1. Keep span as it is in the working draft (that is, with deep comparison and non-Regular). This is what the field experience is with, and it’s hard to generally go against known usability.
  2. Change span's comparison semantics to be Regular (that is, shallow comparison). This has the advantage of maintaining a consistent programming model: span is a pointer type, generic programming with it Just Works.
    It brings with it a few other issues though, namely (a) mixed span comparisons would need to become ill-formed for general sanity and (b) it brings in an inconsistency with a type that’s already in the standard that’s very similar to span: string_view. Making string_view Regular is probably not an option (and despite it being a const view, it would still break in the counting_sort example). On the other hand, while it would be weird for string_view and span<char const> to have different semantics, they are certainly different types with different constructors and wildly different member functions.
  3. Remove span's comparisons altogether. This is one of those wonderful-looking on-paper compromises that probably nobody would ever get behind.
  4. One particularly interesting idea, courtesy of Tim Song, is an attempt to bridge the divides between (1) and (2). We know that people want the deep comparisons (primarily or even exclusively in non-generic contexts) and we know that having deep comparisons breaks generic code. Why not both? That is, what if span could somehow opt-out of the EqualityComparable concept? This would allow the real-world use-cases to keep using their deep comparisons while avoiding the generic programming problems by having span fail to meet those constraints. This wouldn’t just be a solution for spanstring_view could also do the same by continuing to provide deep equality but simply not be EqualityComparable (as could optional<T&>).
    There’s no direct way in the language of opting at all (in or out) with regards to Concepts. But the Ranges TS does provide precedent here. There is a library mechanism to opt out of SizedRange and SizedSentinel. Perhaps this calls for one for comparisons as well? I’m intrigued by the idea.

Before this week, I was firmly in the camp of option 1. Team Irregular. After lengthy conversations with Tim and some clarity from Eric Niebler, I’m… not sure. It’s a really interesting topic. One certainly worth the time to discuss.

--

--