Should Span Be Regular?

In my last post, I talked about the concept of Westie types (yes, I am trying to make this happen), what Regular means, and which of them are Regular. I went through an explanation for why means for span is not Regular and potentially why it should be. After lots of resulting conversations with several people (thanks Zach Laine, Nicole Mazzuca, Eric Niebler, John Shaw, Tim Song), I thought it was necessary to write a follow up with more details and more argument.

In Fundamentals of Generic Programming, Stepanov introduces four axioms to describe the equivalence between copy construction/assignment and the equality operator required for the Regular concept. However, the definition in that paper was insufficiently precise — it was too handwavy, in a way that made it difficult to carefully reason about different types. I saw lots of objections to either the axioms, the resulting arguments I made from them, or both. Eric Niebler filled in the gaps for me in the following way:

  • Define the value of an object as what its operator== compares. For instance, the value of a pointer is its address. The value of a reference is the value of the object it refers to. The value of an int is… well, its value.
  • In the fourth axiom
    T a = c; T b = c; zap(a); assert(b == c && a != b); 
    We are defining zap() to be some operation that changes the value of the object. This axiom must hold for all zap()s, and the function itself does not have to live within the constraints of the C++ world.

This model makes it possible to clearly evaluate all types. For string_view, for instance, the value is its underlying range of chars. Even though it only provides const access to those, we can still say we have some zap() that changes one of those chars — and such a zap() would trigger the assert (because b would change along with a, and so a != b would fail). Thus, string_view is not Regular. By the same argument, span is not Regular.

More interestingly, we can evaluate a type whose comparison does not take into account the complete object. For instance:

Is this Regular or not? The value of a Foo is just its x member (this is what its comparison actually checks). It’s easy to see that the first three of Stepanov’s axioms hold. Does the fourth? A zap() that changes the value of the x member would cause a != b to be true (because we’d subsequently compare the now-different xs) and it would in no way affect either b or c, so b == c would continue to hold. Hence, Foo is Regular. Even though y isn’t compared! We’ll get back to this one later.

I’m going to take a brief aside and talk about references. References are not Regular, for at least the obvious reason that they aren’t default constructible. But even if that weren’t a requirement, references would fail the 4th axiom — if we had three references to the same int, for instance, and we change the value of one (which is to say, changing the underlying int's value), we would change the value of all three of them. This fails the “independence” part of the assert.

Matt Calabrese gave a great talk at CppNow 2017 entitled “An Approach to Dealing with Reference Types in the Generic Programming Paradigm” — which, given the title, is obviously super relevant to this topic. In the middle of the talk, he introduces how to do assignment for variant assuming Regular types. And then, at 51:55 in the talk, he introduces a request to have a variant that contains a reference type. The example goes on to show that if we just simply use the types’ provided semantics (references are assignable — it’s just deep assignment):

We could run into trouble when we do this:

The problem here is the semantics of this program are extremely path dependent. When the user_input() is false, we put the variant in an empty state — this is fine. When the user_input() switches to true the first time, we construct a new reference. Also fine. But when it’s true the second time, we assign through to the initial reference. Completely different behavior dependent on what state we happened to be in before. This makes for code that’s extremely difficult to reason about.

This, to me, is a very compelling example. While the specific scenario is a little odd and contrived, the general flow is something that could easily be run into — and it makes for a good argument against a hypothetical proposal to allow reference types in optional and variant with deep assignment semantics, even though that’s what reference types’ assignment semantics are.

Back to span. I’ve heard from numerous people (including Matt) that span should be Regular (that is, its value should be its pointer and its length, not its underlying objects — this choice makes it satisfy all four axioms). However, what I have not yet heard is an example for the problems of an Irregular span that is as compelling as the above example for a deep-assigning optional<T&>.

My last post purported to have such an example — it was a counting sort which required the elements in the Range to be Regular:

And the post provided an example where this algorithm produces subtly unexpected behavior: if you provide a range of spans to different vectors, that nevertheless have the same value, the result of the sort is two spans to the same vector (whichever one comes first).

However, does this example really demonstrate a problem with span failing to be Regular?

Consider the Foo type from earlier in this post. Foo is a Regular type, as I showed, so we should have no problems. Let’s find out. Does it counting_sort()?

Do we end up with five distinct Foos?

No. We end up with 5 copies of Foo{1, 1}.

So if this example is to be considered problematic for non-Regular span, it’s also problematic for Regular Foo. And likewise it would also be problematic for Regular float. If you exclude NaN, float is a Regular type, but positive and negative zero compare equal to each other — which is a distinction that this algorithm would erase.

As a result, I’m not sure that this algorithm actually demonstrates that there is a problem with span having deep comparison semantics. It merely demonstrates that if a type’s value is not entirely encompassing of all of its observable traits, some of those might be collapsed.

Consequently, I am left with no examples demonstrating a problem with span having deep comparison semantics.

Consider this post a request, in good faith, to everyone who thinks that span's comparisons should be anything but deep (whether span should be Regular and have shallow comparison semantics or should not be comparable at all): Give me a compelling example that shows why this is bad.

By compelling example, I would prefer a negative example (that is, like Matt’s CppNow example, code that a programmer would expect to work but doesn’t). But I’d also be very interested in positive examples (that is, code that you could write if span were Regular, but can’t).

First one to do it gets a beer in Rapperswil!