My Precious Compile Time Bool (long introduction to Move Semantics)

There are big problems in C++ community with regards to Move Semantics. Big problems. Many of those problems are tutorials about Move Semantics. 😉

Spoiler alert: Move Semantics is about moving

There are plenty of existing tutorials on Move Semantics. You can read them, or not. But if you believe me 😉 they suffer from following problems:

  • they spend too much time on boring concepts(Rvalue Reference, Lvalue References, RVO, reference collapsing) that beside being useless to average developer looking for accessible introduction to Move Semantics are so interesting that they make monads and preprocessor look fun
  • they assume that C++ Move Semantics was designed correctly

Second point may seem useless to discuss since it is what it is, but I think understanding the design helps with understanding and remembering how to use Move Semantics.

Before I go into the details of points above let’s discuss the motivation for Move Semantics. Please do not skip this since IMAO it actually helps to see the design goals when learning about Move Semantics.

We ❤️ Value Semantics

If you want to be cool kid in the job interview when somebody asks you about difference between C++ and JAVA or C# do not say Garbage Collection or VM. Say Value Semantics vs Reference Semantics. 😃 This does not mean that C++ does not offer Reference Semantics, but we are talking about general defaults in the languages.

While some people disagree in general people agree that C++ programs using values instead of references/pointers are easier to reason about(meaning easier to write with fewer bugs). So it is not just about avoiding memory leaks, it is also about being able to reason about the code more easily.

Now before C++11 C++ had this problem that Value Semantics were expensive. C++ was copying stuff when it could move them. And here I am not talking about some language specific meaning of copy or move. C++ value semantics often meant equivalent of doing a copy then destroying the source instead of moving the source. One example where working around this performance problem was quite painful with regards to readability was returning expensive to copy objects(like big std::vectors) from functions.

In the code above C++98/03 would (ignoring possible optimizations) copy the result of get_vector() to new_way, then destroy that result. Since C++11 move is guaranteed. This means that result is moved to new_way. Again when I say moved do not think about std::move or something magical. Conceptually this is just an optimization where instead of copying we move stuff. What is important to see here that we can move the objects that are about to be killed soon. Temporaries.

Who gets speedup from Move Semantics?

In general you can think of more complex types like std::string, std::vector, std::unordered_multimap… benefiting from Move Semantics, while simple ones like int, double, bool, struct s{int x1,x2,x3,x4,x5,x6;}; do not.

More precise definition is harder to achieve without understanding the implementation of certain types, for example you may think of std::array< float, 100> as this complex type that benefits from move semantics, but it does not(because all it’s data is on the stack).

To make things more fun std::array<std::string,100> does benefit from move semantics a bit since although all 100 strings need to be moved to new home instead of just some pointer adjustments like std::vector<std::string> would do it still benefits from the fact that each std::string will be moved and not copied.

Regarding composite types. They do benefit in a : “depending on the benefits of members” kind of way. This sounds confusing but the general idea is that compiler will try to generate move constructor that will do moves memberwise. Depending on the types of the members you will see or not see the speedups.

One nice thing to notice is that in some cases(for example returning std::string from a function) C++11 made existing code faster without requiring developers to make any modifications to source code.

Existing usage of STL became faster when STL learned to love Move Semantics.

Not just about the speed.

If you have ever used unique_pointer you know that you can not copy it, but you can move it. You can probably guess why. 2 pointers pointing to the same object are not unique_ pointers. They are more of a shared_ kind of pointers.

In other words Move Semantics enables us to express the unique pointer ownership.

So how does this moving looks in the source code?

As we have seen one of the requirements for moving is that source is temporary. This can happen when you are returning something from a function or you concatenate strings(concatenated string is a temporary).

Or you can manually call std::temporrify on object to make it appear as a temporary. Actually std::temporrify does not exist, and if you want to make something a temporary you call std::move. This is a bit confusing since std::move is named after what it looks like it does, not what it actually does. Imagine your laptop power button having a label “Read Medium”. You do not have to read Medium when you power on your laptop, but almost always you do 😉. Same with std::move.

All std::move does is to turn an variable into a temporary in the eyes of the compiler. This means that if used as an argument to a function that has overloads for temporary and non temporary arguments generally speaking different things will happen.

So std::move by itself does not do any moving, but usually when you call a function with an parameter that is a temporary it will move that argument.

Beside code that involves us forcing a variable to a temporary let’s take a look at come situations where temporaries naturally occur.

If you know what is RVO you may know this screenshot contains lies, but let’s not complicate things 😉

OK, but how do functions doing the moving look compared to normal functions? For std::string among many constructors we have this ones:

  • string::string(const string& other);
  • string::string(string&& other);

First one handles the case when you are copying(notice that const — even if we wanted to modify the source by mistake we can not).

Second one handles the case when we are moving(notice no const and &&). We will modify the other argument by “stealing” it’s content.

So bla&& means temporary bla? Well this is C++ so the answer is yes and no… welcome to pain that is writing code using Move Semantics.

Out of two possibilities we got almost the best one

Here we get to the part where “normal” tutorial would enlighten you with profound statement like

Named Rvalue Reference is an Lvalue Reference

But we take a different approach.

C++ had a choice between 2 semantics for temporaries:

  • easier to teach, more consistent, better for common case
  • the one C++ choose

You see inside a function temporary argument is no longer temporary (“Named Rvalue Reference is an Lvalue Reference”)

It is not easy being a temporary

In our example above tmp_str inside f is not a temporary so it does not call g overload taking a temporary.

Now you can fix this by writing g(std::temporrify(tmp_str)) or using real std:: names g(std::move(tmp_str)), but this is less then ideal.

Most of the time you are not writing functions taking temporary strings that do nothing with them, so let’s look at more realistic example where forgetting std::move is a performance bug.

Add a bit of spices and std::move to restore performance

What should be default?

I have accused C++ default semantics for temporaries of terrible terrible crimes, but what is the reason why C++ made choices it made?

Well in theory C++ handling of temporary arguments is safer. It helps you when you use your temporary argument more than once and makes it obvious at every call site that you are intending to use the variable as a temporary. So if you have func(var) you know var is not treated as temporary (you need to write func(std::move(var)) for that), while in default I say is better reader needs to go look at where var is declared to see if it is a temporary or not. So I am wrong? 😛

The problem with me being wrong 😉 is that most of the use cases of temporaries are simple forwarding(for example from constructor argument to member initializer or just forwarding argument of a function as a parameter to another function call ), so we are paying cost of making rare complicated case nicer by complicating simple common case.

Here we can see a contrast between what C++ does now and imaginary C++ standard where temporary arguments stay temporary and you need to std::protect them to make them look not temporary. Note that std::protect is theoretical, we are just exploring the other design.

OK, but you are not too concerned by this problem, you can just remember to type std::move to Make Temporaries Temporary Again and all is good?

Well… no.

You see the problem is the existence of functions that take more than 1 parameter. 😟

Exponential escalated quickly

Functions and constructors that you have properly overloaded with for example const string& and string&& work fine when you need to handle just 1 argument that benefits from Move Semantics. 2 such arguments? 4 overloads. 3 such arguments? 8 overloads. 😢

n such arguments ? 2^n overloads. 😢^n.

Same problem applies to templated functions taking many arguments, but let’s stick to constructors example since that is the code almost all C++ developers write often.

Example of constructors required to properly handle case of 2 member variables that benefit from Move Semantics

At this point you may think I am trolling you or I do not know what I am talking about. Here is a link to CppCon talk by a Nicolai Josuttis if you want to spend 50 minutes to check I am not lying. You think Nicolai is also inventing stuff? Well he is part of the standardization process and has written many C++ books. I guess I am good at complicated conspiracies. 😉

But you say: surely there must be some way for me to specify I want to take the string argument that is either temporary or not. Like string&&& str or string&? str.

You may already know how this arguments are called. Universal(they work on temporaries and non temporaries) References. They are also known as Forwarding(they are used to forward arguments) References.

Well be careful what you wish for…

Templates - Hero we did not need

C++ solution to your problem of not wanting to write exponential number of constructors/functions is templates.

This may seem weird to you since you know your arguments are either string&& or const string&(or some other pair like std::set<int>&& and const std::set<int>&). In other words your class does not support initializing member variables with any type that somebody throws at the constructor. Does not matter.

That precious compile time bool

As we discussed previously one of the problems Move Semantics solves is to know when certain argument is temporary and when it is not. This is precious information we need in order to properly use that argument.

Calling std::move on argument that is not a temporary is likely a bug since turning it into a temporary is a sign that code operating on it can modify it.

So how did C++ decide to encode information about parameter being a temporary?

Templates. More precisely template type deduction. Hopefully next confusing 😉 picture will help you see this in action. If you do not know what static_assert(std::is_same_v<T1,T2>) does: it checks at compile time that types T1 and T2 are the same(note that int and const int are not same types, similarly double& and double are not same — const and reference matters).

Some key observations about the code example above:

  • parameter of function func is not a temporary, although argument is declared with &&
  • same code(functions are the same) deduce different type for T based on how they are called(with temporary variable or not)
  • argument that accepts both temporaries and not(for example std::vector<int>&& and std::vector<int>&) aka Universal Reference must be templated 😢
  • if parameter is temporary or not is “encoded” into the deduced type of T

Regular introduction to Move Semantics would now spam you with reference collapsing rules as if their understanding has any value.

Truth is that templates here serve as hacky way to store that precious bool(is argument temporary or not) and enable code inside the function with Universal Reference arguments to access that bool.

Later we will see how exactly we can use that bool, but for now let’s spend some time to go over the issues with what we have learned.

Minor mountain of problems

This way of dealing with temporaries causes serious problems.

Templates, really?

Although some could say that many uses of Universal References are already templates there are still problems with this approach.

  • Writing an efficient constructor for a simple class should not require you to either write 2^n constructors or use templates
  • Single responsibility principle — templates already break it since they are used for generic stuff(containers, algorithms) and compile time computations but this is just making the problem worse.
  • Since we are bringing templates to the table we get horrible compile errors, easy to misuse overload sets… If you want to learn more please read Effective Modern C++ Item 26 and/or this blog post and/or watch Nicolai CppCon video.

&& does not mean temporary

It would be nice if we could teach developers that && means temporary.

But we can not, since like we saw Universal References that can bind to anything use && syntax. And this is not just an esthetic concern. People like to use std::move on arguments that have && next to them. Problem is that in case of Universal References sometimes those arguments are not temporaries. 😢

Using Universal References

With the complaints out of the way let’s now see how can one use this Universal References.

First we need to remember our original goals. Keep temporary arguments temporary when forwarding them, while making sure we do not turn arguments that were never temporaries into temporaries.

Secondly we need to remember that if argument is temporary or not is stored in the type of the template argument.

Now we can understand how std::forward is implemented and for what is it used.

std::forward is a conditional cast that:

  • makes temporary parameters temporary again
  • does not modify non temporary parameters

Contrast that with unconditional cast std::move:

  • makes anything you pass to it temporary

Note that std::forward can operate on variables that are not arguments, we are talking about normal usage here.

So let’s see how std::forward is used in code.

Good news is that std::forward does what I told you it does. It properly casts argument to temporary only if it was temporary. Bad news is that it is quite verbose and it is easy to forget it.

And how does it work? Well you can read the STL source code😉, or remember what we learned: if parameter was temporary or not is encoded in the T. So std::forward<T>(print_me) uses that information to make parameter of print function temporary or not. std::forward nicely wraps that compile time logic but you could do it yourself.

Do not do it by yourself, this is just an example to help you see what is happening.

Manual reimplementation of std::forward logic

If you are wondering why you must write std::forward<T>(print_me) instead of std::forward(print_me)…

Reason is that it does not work without <T>. Really nice design, I know.

Actually std::forward is so nice that there is a proposal to work around it’s verbosity by baking it into the language( with operator >> ), if you want to see how std::forward makes simple code ugly you can read the motivation examples in the paper.

One common thing in real use of std::forward is that it is used together with Variadic Templates (… syntax). Since Variadic Templates are complicated topic in itself I intentionally did not want to use them in examples, but I think it is good you know they exist.

One last thing

If you cast your mind to the time when I was ranting about how arguments that are temporaries stop being temporaries and you did not believe me it matters that much because you can just learn to keep on writing std::move in the call chain…

Please notice that similar situation applies to std::forward. If C++ choose to remember when arguments are temporaries all that lovely std::forward spam in our code would be unnecessary.

Now that we know how to use Universal References let’s see how to not use them.

Using Move Semantics, in a Wrong Way

Example 1

One common mistake when writing template code using Move Semantics for the first time is shown in the following picture.

Can you see the bug?

Hopefully you can see from the code where the bug is but if it is not clear explanation follows.

Developer assumed that && in the line 85 means that arguments are temporary so he used std::move in line 86 to make the arguments temporary again.

As we learned today unfortunately && does not always mean that argument is a temporary. So our not_temporary from from main() had it’s content stolen when it was moved down the call chain and used to move construct string s in line 75(notice Visual Studio debugger nicely showing not_temporary as empty after the call to fwd_func in line 93).

Example 2

This infamous example is already covered in links I mentioned when I told you that bringing templates to handle temporaries was a bad idea. So if you did listen to me and read Effective Modern C++ Item 26 and/or Eric Niebler’s blog post you know what the problem is, but I will try to additionally explain why this issue is this so irritating.

First of all underlying problem is similar as in Example 1. Assuming that && means temporary. And mixing that with overloading.

Code example where “unexpected” overload is picked follows.

General cause of the bug that people give here is that code is overloading in a way that one of the overloads uses Universal Reference. While this is technically true real problem is that C++ abused templates to deal with temporaries.

Notice how the developer correctly overloaded func(for cases when argument is temporary and for cases when it is not).

Then he made a mistake of thinking that templates make sense.

If you look closely you can see he mapped the func to func_template reasonably. Unfortunately he was wrong.

Like we learned today func_template taking && argument will also accept the arguments that are not temporaries. Contrast that with func taking && argument — it does not accept arguments that are not temporaries.

What I hope is clear here is that this is really really ugly situation. Ideally templates should be just templates 😃 in a sense that when you want to generalize your code you can just replace specific type (std::unordered_multiset<char> in our example) with T, write template<typename T> in front of the function and you are done. When your arguments have && unfortunately this is not the case.

Example 3

Move all the things!

Not wrong, but spammy

Here the problem is that the developer read something about std::move making things faster so he is just using it on everything.

Like we discussed before only certain types benefit from std::move, int is not one of them.

Good thing about this is that it is not a bug.

Code works correctly(it must since this kind of code may happen in templated code), it is just spammy and confusing.

Example 4

Use after move.

Does not crash, but it is buggy

Here we have example of use of lang_name after it was used as temporary as argument for string concatenation.

In general you should not use variables after you moved from them unless you brought them back to specific state. For example calling .size(), appending to a moved from string is bad. Calling .clear() and then using that string is fine. Calling = on moved from string is fine(since any potential garbage in string does not affect the assignment).

Now let’s look at bug where result is good old crash.

Again the cause of the bug is the same. We used the p1 after move. Note that like before with std::string we could reuse p1 after move if we called .reset() to set it to point to some new int.

Move Semantics is still Great

Reading up to here you may have gotten impression that Move Semantics is last nail in the coffin of an already bloated and complex language. If you did I do not blame you, I focused on the details that are hard and tricky and wrong.

Move Semantics has ugly parts, but it has enabled us to write much nicer and efficient code and it is a huge success.

And if you are not writing generic(templates) code you can stay away from most of the ugliness.

Now let’s get closer to the ugliness.

Meet me Halfway between Usability and Performance

If you remember our example of a class where we needed to write 4 constructors (2²) to handle all combinations of temporary and non temporary arguments you may think that the best way to fix this is to make 1 constructor with 2 Universal References. I have good news and bad news.

Bad news is that your assumption is correct.

Good news is that luckily you can write 1 constructor that is not using Universal References and is a very good(but not best) solution.

Best solution minimizes the number of copies and minimizes the number of moves.

Very good solution minimizes the number of copies but makes some extra moves.

Since generally we assume moves are cheap(this is dangerous generalization I know, but it is true for heavily used std::string and std::vector) this is acceptable compromise: we pay few extra moves(and maybe no runtime penalty if optimizer is smart) and in return we get to not write templates.

OK, so let’s look at the comparison of C++98 constructor that handles temporaries in a slow way, C++11 way that handles temporaries very well and C++ Imaginary where we have Universal References without templates and language does not forget that arguments were temporaries.

If we ignore daydreaming(PersonImaginary) and focus on Person11 it does not look that bad, right? Except std::moves in the initializer list it looks like reasonable code. So this is what I recommend to use if you want to make sure your class does perform when it comes to temporary arguments.

You may wonder where do those extra moves compared to optimal solution come from. You may also wonder why I spend many characters to tell you about dangers of blindly calling std::move and about that precious bool if Person11 constructor is written like it is.

If you did not already now please note that arguments to Person11 constructor are values, not references. It means that temporary or not is handled before constructor starts. How?

For example consider how argument last_name is handled.

When you call constructor compiler will see last_name is a value and that it needs to construct argument last_name.

If the variable you called constructor with was temporary it will move construct last_name (using string::string(string&&) constructor), if it was not it will copy construct last_name (using std::string(const string&) constructor.

Now we can safely move construct member variable last_name_ from argument last_name since last_name is a value, and not a reference.

And now let’s talk about those extra moves.

When constructor was called with a temporary we move that temporary into argument last_name, then in the initializer list we move argument last_name into member last_name_.

If we used best solution(ugly template way with Universal References) temporary would not be moved to last_name argument(since last_name would be reference), but instead reference to temporary would be forwarded all the way to the constructor of member last_name_.

So that is it, this is the way to write constructor, just do not generalize it to every other function…

You care if something is temporary only if you can reuse it

Assume you have the function that counts vowels written in old C++98.

std::size_t count_vowels(const std::string& str);

How should that function signature look in C++11?

Answer is: same. 😃

If you considered adding overload with std::string&& or replacing it with the function taking std::string by value you were wrong. 😉

If you are not going to benefit from using a temporary you pass std::string and other similar types by const reference. count_vowels just counts, it does not need to store the input string.

More contentious example is this:

std::string func(const std::string& str);

Should you replace it with a function taking a value? That way in case of a temporary argument you can use that temporary for return value of the function(notice that input argument and return value types match).

My answer is no. For readability reasons. But I think you should know about this pattern if you ever see it in production code or you think it is not a readability problem.

Recap

Here we briefly recap this entire article. We again briefly mention design decisions made so we can easily remember why we need to do certain things and why we can not do certain things.

  1. C++ needed a way to efficiently deal with temporaries so we can use Value Semantics efficiently, answer was Move Semantics.
  2. Even without using Move Semantics in your code you are benefiting from it if you are using STL containers.
  3. C++ seemingly choose to use the syntax of && for arguments that are references to temporaries.
  4. std::move casts the value to temporary, and that usually means different overload will be used when result of std::move is used as argument.
  5. C++ for safety reasons decided to force users to make temporary arguments temporary again with std::move(cast that casts value to temporary)
  6. To deal with functions that need more than 1 argument of types that benefit from handling temporaries(you would need to write 2^n overloads) and to handle generic(template code) Universal References were introduced. It is an argument type that can handle both temporary and non temporary parameters.
  7. Unfortunately C++ choose to use same syntax && and encode if the parameter was temporary in deduced type of a template. This meant 2 things: && no longer means temporary and if you want to use Universal References you must write templates. Twin of unconditional cast std::move is conditional cast std::forward.
  8. Good middle ground between performance and readability for constructors is using pass by value design.
  9. Pass by value should be used only for cases when the function benefits from ability to reuse the temporary argument.

Could it be fixed?

Now that you have learned what Move Semantics is or you gave up on it since it is so ugly and complicated you may wonder if C++ can “be fixed”?

In theory yes, in practice no. Existing code must continue to compile and bar for fix is extremely high.

When you have 7 ways to do something in a language 8th way must be much much better than existing 7 to justify work on designing it and the learning required by millions of C++ developers. I do not see that happening for problems discussed here.

On that happy note: Bye, and have fun moving ! 😃

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.