Back to Basics: Pass-by-value vs. Pass-by-Ref

Luciano Almeida
6 min readFeb 5, 2023

--

Working with C++ for a while, we started to note that passing values to function by copy vs. by reference is one of the first concepts that people learning the language encounter and that can be a bit challenging to grasp.

“When should we pass-by-ref vs. pass-by-value?” is a common question, especially for people that are not C++ developers or are starting to learn the language, and one of the misconceptions is that we often hear is that pass-by-reference is always better in terms of performance, which although we can understand how one could get to this conclusion, there is a bit more to it.
And this bit more is what we are going to go into in this article!

So let’s start with the basics

What is a reference?

Before understanding what a reference is, we first need to understand the concept of a pointer.

All data in our program is stored at some place in virtual memory and that location is identified by an address. With that said, a pointer is a value that holds a memory address instead of data. But note that a pointer is also a value(normally the size of a WORD in the platform that we are targeting), because we also need a memory location to store that address, so an address of a pointer variable can be stored in a pointer-to-pointer variable e.g. int **ptrToIntPtr.

So now that we understand that, we can define what a reference is!

A reference is nothing but an implicit pointer that is guaranteed never to be nullptr. Note that it can still be an invalid reference where it points to a memory location that no longer belongs to that program because it was deleted or is not valid because it was a reference to a stack location where the stack pointer is no longer over it or it simply belongs to another frame. But the important thing is that we may see segmentation faults or if we are unlucky, we may step into undefined behavior land.

A small example of invalid reference

If we are using ASAN, it will probably crash at runtime and tells us something like ERROR: AddressSanitizer: heap-use-after-free on address

The main point the we need to understand is that a reference is an implicit pointer and every access to it is de-referencing that implicit pointer.

Right, now we know what a reference is. Let’s go into the main topic!

Pass-by-reference is always better?

Before answering that question, let’s first understand what passing an argument to a function is.

Every programming language that is ahead of time compiled defines an ABI(Application Binary Interface) for each platform that it targets.

One of the aspects of that ABI is calling convention, which defines how a function call is made at binary level, which registers are callee saved, which registers serves a given purpose, for example, which registers are used for argument passing and return value.

With that said, an argument passed to a function is either pass using a register if it can fit (or in many registers depending of how many the ABI defines that are reserved for argument passing and when Scalar Replacement of Aggregates[4] can split in the argument into scalars and passed them as individual arguments) or it would be passed on the stack. We can find a detailed explanation on all things that happen when we call a function in this article.

But what is important for us to understand in this article is that a all arguments are passed by a value being either put on a register or on stack, and as we have seen in the beginning of the post a reference or pointer is still a value of WORD[6] size that contains a memory address and given is word size, it can fit in registers. And the function instructions, instead of reading the value from a register or stack, have an extra de-reference cost because it has to look at the address in and then fetch the value.
So with all that information in hand we are now more confident that we can answer this question.

And the answer, as in everything in programming is: It depends …

If we need to modify the value inside the function, by value is not going to helps us there, so reference semantics is our only choice either using a reference(&) or a pointer(*) syntax.

But otherwise,

If a value is trivial or a very small struct that is cheap to copy probably would be better to pass by value. But if you are passing a string or a vector which copy means invoking a copy constructor and creating a new vector with all elements, definitely use a reference.

If you have an uint8_t for example, and you are compiling targeting a 64-bit platform where a pointer value size is also 64 bits, by passing it by ref we are actually copying a bigger value. Although it doesn’t really matter much given that both fit in registers, we would still consider the extra de-referencing that would happen and if we are in hot paths of our program we would be able to see a small difference.

Let’s see a small example of that and how passing by reference a trivial type such int can be less performant that a trivial copy.

Consider this simple example:

And the generated code for x86

What we can is that both int value and reference(pointer) are passed into the rdi register which is defined by clang C++ x86_64 calling convention to be the first argument. But the most important thing to note is that for the function where a reference is passed, there is the first extra instruction to de-reference the address and then an add into return register, while the value function compilers could even do a little optimization to use lea instead of a mov and add which is a clever trick.

And here is an interesting measurement

Benchmarking copy vs. reference

We can confirm how passing-by-value is a little bit faster than by reference in this example.

Another example where passing by reference can limit the optimizations is when the compiler has to be conservative because of possible aliasing.

Let’s consider this small sample:

Looking at the resulting code, we can see that for pass-by-value function, the compiler can vectorize the computation because the value is local copy and local analysis can see that it does not change during the loop. But for pass-by-reference although it could at least unroll the loop, because reference is an implicit pointer and pointers can alias, compiler has to be conservative, so vectorization is not viable.

Conclusion

We tried to show in this post, what happens when you pass-by-reference and pass-by-value and few examples where the assumption that pass-by-reference is always better may not be true.

So when the “should we pass-by-ref vs. pass-by-value?” question for a code that we are writing comes to us, the answer is:

Think about it, look what kind of data are we passing, it is trivial to copy or expensive enough to worth a reference? In most cases it wouldn’t make any difference, but benchmark if performance is important because we are in a hot path of the program.

But if we want a more, rule of thumb answer is that if the type is small and trivial to copy pass-by-value is the one to go, otherwise pass-by-reference. And be aware that small is relative to the WORD size on the target platform, if you are compiling for a 16-bit architecture with a limited set of registers, a 64-bit value may be big enough to worth pass-by-ref, but in the case, we have to think more carefully.

References

  1. What is the difference between MOV and LEA
  2. MOVZX — Move with Zero-Extend
  3. Where should I prefer pass-by-reference or pass-by-value?
  4. Scalar Replacement of Aggregates LLVM Optimization Pass
  5. Quick Bench
  6. Word (computer architecture)
  7. Using LEA on values that aren’t addresses / pointers?

--

--

Luciano Almeida

Aspiring Compiler Engineer, Swift and OpenSource enthusiast