xvalues and prvalues: The Next Generation

Barry Revzin
5 min readFeb 11, 2017

--

C++11 introduced the ability for code to differentiate between lvalues and rvalues — a pretty powerful feature that is what move semantics are built on top of. It also introduced a whole taxonomy of value categories:

Because obviously

But I think there’s still something missing. I don’t know what might look like, but let me explain. Let’s say we want to build a function named identity(). Just a straightforward function template that returns exactly what you pass in. Let’s focus only on regular types — types that are copyable and moveable — for the purposes of this post. Can we do such a thing? A first go might be:

How does this fare? It’s certainly correct. But how expensive is it? For lvalues, this performs a copy (into the function) and a move (out of it). For xvalues, this performs two moves. For prvalues, the function argument is constructed in place (guaranteed in C++17!), so we only do a single move. In every case, we have to do something.

Let’s try again, using our new C++11 tools:

For lvalues, the function is now free. The template parameter T will deduce as an lvalue reference type, so the argument will just be bound to the reference in and then bound to the reference out. No copies or moves. Perfect. For both xvalues and prvalues, the argument will be bound to the parameter reference inbound and then moved into the return object outbound — one single move. This solution is at least as good as the first solution for all cases.

But we still have that extra move. Can we get rid of it by just holding onto the reference? It’s tempting to write:

After all, it looks right. We get the same thing out that we put in. No copies or moves in any case. Ship it!

Except for there’s this problem in [class.temporary]:

The temporary to which the reference is bound or the temporary that is the complete object of a subobject to which the reference is bound persists for the lifetime of the reference except: […] A temporary object bound to a reference parameter in a function call (5.2.2) persists until the completion of the full-expression containing the call.

What this means for us is if we pass in a temporary, it’s lifetime is kaput pretty quickly. And not just with prvalues — we can very easily run into the same issue with some, but not all, xvalues!

We can’t differentiate between xvalue and prvalue arguments. We certainly can’t differentiate between “safe” and “unsafe” xvalue arguments. There are even cases where passing in lvalues can dangle! So the only real safe way to implement identity() leaves some potential speed on the table, because we have to take ownership:

After all, you can’t end up with a dangling reference if you don’t return a reference.

What does this mean for practical code? It leads to some odd situations. First, the lifetime rules are decidedly non-trivial and lead to situations like:

b.val is a dangling reference for the same reason that our attempt at identity()failed: lifetime doesn’t extend through a function. But we sidestepped the constructor with a.val, so no such issue.

It also means that while we can write range-based for statements with prvalue ranges, we can’t propagate that further. That is, given a hypothetical function std::vector<int> getInts(), the following is perfectly safe:

But what if we want to print them reversed. We have to do something with the underlying container to hold onto it. If we hold onto it by reference, that works — sometimes:

To guarantee safety, we have to make sure that we hold onto the container by value:

This is a bit unfortunate. The initial longer code, introducing an extra variable to bind to the result of getInts(), incurs zero copies or movies and is safe. The shorter code is more ergonomic, but must perform an extra move, even if there are cases where we wouldn’t need it.

And so we’re stuck. Can’t very well write a bunch of code that can easily lead to undefined behavior. Pay for what you need — except when you need to extend lifetimes, in which case, hope you’re okay with that extra move?

Seems like we have a few options on how to solve this problem:

  1. This isn’t really a problem. People can deal.
  2. Just always use raw pointers for everything. This is clearly the kind of advice everybody can get behind.
  3. Make it possibly to explicitly extend lifetimes of temporaries bound to references. This seems potentially viable, but that kind of annotation would be frustrating to have to write, since it would end up being used predominantly in the “obvious” cases, and it would be very error-prone. And what would what annotation look like? std::forward_and_lifetime_extend<T> is a mouthful. T&& [[extend]] breaks how attributes typically work. This even without dealing with the main issue of lifetime extension: how does the compiler know where to construct the object?
  4. Make it possible to implicitly extend lifetimes of temporaries bound to references. This approach seems really difficult. I base this opinion not just around the fact that if it was easy, somebody would’ve done it a while ago.
  5. Add the ability to differentiate the safe and unsafe cases. This seems difficult, and is more difficult than it even first appears. More on this in a moment.
  6. ????
  7. Profit

What are the “safe” and “unsafe” cases? I touched on this earlier, but it boils down to what actually happened by the time we get into our function body. If we have a reference parameter (either rvalue or lvalue reference), does that parameter refer to an already-existing object or was it bound to a temporary? If the reference was bound to a new object, then we either need to extend that lifetime, or copy/move out of it. If the reference was bound to an existing object, it’s safe to just use that reference. Right?

This isn’t real C++, there is no way to currently implement is_bound_temporary. But let’s assume there was. Would this be viable? Now this is free for lvalues and those xvalues that refer to existing objects, and incurs one move for xvalues that don’t refer to existing objects and prvalues. That’s awesome! I mean, not as good as being able to transparently pass through the temporary, but certainly better than dangling references.

But how would this case be handled for f?

Calling get() gives us an xvalue. We’d presumably have to handle it as an unsafe xvalue. Which puts us back to square one: as soon as we add any layer of indirection, we have no idea what’s safe and what’s not safe, so we have to assume ownership. It’s not enough to be able to differentiate among value categories and it’s not enough to be able to differentiate by temporary bindings. We need another category.

What if we could annotate references as being either safe or unsafe? safe references can bind to unsafe references, but not vice versa. Temporaries cannot bind to safe references. That would let us write:

This seems like it handles all the cases. But involves a massive amount of user effort to mark all the annotation correctly. And even then, we could still easily end up with dangling references:

Ultimately, isn’t it a bit weird that something as simple as identity gives us so many problems? That’s C++ for ya.

--

--