Applied C++: Return Multiple Values

The best way to return multiple values from a C++17 function

“Do I Know This Already?” Quiz

What is the best way to return multiple values from a C++17 function?

  1. Using output parameters:
    auto output_1(int &i1) { i1 = 11; return 12; }
  2. Using a local structure:
    auto struct_2() { struct _ { int i1, i2; }; return _{21, 22}; }
  3. Using an std::pair:
    auto pair_2() { return std::make_pair(31, 32); }
  4. Using an std::tuple:
    auto tuple_2() { return std::make_tuple(41, 42); }

The answer is at the very bottom of the article.

Use Case: Why to Return Multiple Values?

A typical example is the std::from_chars(), a C++17 function similar to strtol(). But from_chars() returns 3 values: a parsed number, an error code, and a pointer to the first invalid character.

The function uses a mix of techniques: the number is returned as an output parameter, but the error code and the pointer are returned as a structure. Why is so? Let’s analyze…

Analysis

Example code:

auto output_1(int &i1) {
i1 = 11; // Output first parameter
return 12; // Return second value
}
// Use volatile pointers so compiler could not inline the function
auto (*volatile output_1_ptr)(int &i1) = output_1;
int main() {
int o1, o2; // Define local variables
o2 = output_1_ptr(o1); // Output 1st param and assign the 2nd
printf("output_1 o1 = %d, o2 = %d\n", o1, o2);
}

The code compiles to:

output_1(int&):
mov [rdi], 11 # Output first param to the address in rdi
mov eax, 12 # Return second value in eax
ret
main: # Note: simplified
lea rdi, [rsp + 4] # Load address of the 1st param (on stack)
call [output_1_ptr] # Call output_1 using a pointer
mov esi, [rsp + 4] # Load 1st param from the stack
mov ecx, eax # Load 2nd param from eax
call printf

Compiler Explorer: https://godbolt.org/z/Fan8OH

Pros:

  • Classic. Easy to understand.
  • Works with any C++ standard, including C (using pointers).
  • Supports function overloading.

Cons:

  • Address of the first parameter need to be loaded prior the function call.
  • First parameter is passed using stack. Slow :(
  • Due to System V AMD64 ABI, we can pass in registers up to 6 addresses. The sack must be used to pass more than 6 params. Even slower :(

To illustrate the last cons, here is an example code to output 7 params:

// Output more than 6 params
int output_7(int &i1, int &i2, int &i3, int &i4,
int &i5, int &i6, int &i7) {
i1 = 11;
i2 = 12;
i3 = 13;
i4 = 14;
i5 = 15;
i6 = 16;
i7 = 17;
return 18;
}

And the disassembly of the output_7():

output_7(int&, int&, int&, int&, int&, int&, int&):
mov [rdi], 11 #
mov [rsi], 12 # Addresses of the first 6 params get passed
mov [rdx], 13 # via rdi, rsi, rdx, rcx, r8, and r9
mov [rcx], 14 # according to System V AMD64 ABI
mov [r8], 15 # (for Linux, macOS, FreeBSD etc)
mov [r9], 16 #
mov rax, [rsp + 8] # But address for the 7th is on the stack,
mov [rax], 17 # which is slow
mov eax, 18
ret

The 7th address is passed via stack, so we put the address on the stack, then we read it from the stack, then we output the value to that address… A bit too much of memory operations. Slow :(

Example code:

auto struct_2() {
struct _ { // Declare a local structure with 2 integers
int i1, i2;
};
return _{21, 22}; // Return the local structure
}
// Use volatile pointers so compiler could not inline the function
auto (*volatile struct_2_ptr)() = struct_2;
int main() {
auto [s1, s2] = struct_2_ptr(); // Structured binding declaration
printf("struct_2 s1 = %d, s2 = %d\n", s1, s2);
}

Disassembly:

struct_2():
movabs rax, 0x1600000015 # Just return 2 integers in rax
ret
main: # Note: simplified
call [struct_2_ptr] # No need to load output param addresses
mov rdx, rax # Just use the values returned in rax
shr rdx, 32 # High 32 bits of rax
mov rcx, rax
mov esi, ecx # Low 32 bits of rax
call printf

Compiler Explorer: https://godbolt.org/z/Q7P4q0

Pros:

  • Works with any C++ standard, including C, though the structure must be declared outside the function scope.
  • Returns up to 128 bits in registers, no stack is used. Fast!
  • Does not require addresses of the params, which allows compiler to better optimize the code.

Cons:

What happens when we try to return more values? According to the System V AMD64 ABI, values up to 128 bits are stored in RAX and RDX. So up to four 32-bit integers will be returned in registers. One byte more and we have to use the stack.

Still, we don’t need to load output param addresses, so it is faster than the output parameters method.

Example:

auto pair_2() { return std::make_pair(31, 32); } // Just one line!// Use volatile pointers so compiler could not inline the function
auto (*volatile pair_2_ptr)() = pair_2;
int main() {
auto [p1, p2] = pair_2_ptr(); // Structured binding declaration
printf("pair_2 p1 = %d, p2 = %d\n", p1, p2);
}

The generated assembly code:

pair_2():
movabs rax, 0x200000001f # Just return 2 integers in rax
ret
main: # Note: simplified
call [pair_2_ptr] # Just call the function
mov rdx, rax # Use the values returned in rax
shr rdx, 32
mov rcx, rax
mov esi, ecx
call printf

Compiler Explorer: https://godbolt.org/z/9iXzSb

Pros:

  • Just one line of code!
  • No need to declare the local structure.
  • Just like with the structures, returns up to 128 buts in registers, no stack is used.

Cons:

  • Pair is just two return values :(
  • Just like with the structures, the function can’t be overloaded.

Example:

auto tuple_2() { return std::make_tuple(41, 42); } // Just one line!// Use volatile pointers so compiler could not inline the function
auto (*volatile tuple_2_ptr)() = tuple_2;
int main() {
auto [t1, t2] = tuple_2_ptr(); // Structured binding declaration
printf("tuple_2 t1 = %d, t2 = %d\n", t1, t2);
}

The code compiles to:

tuple_2():
movabs rax, 0x290000002a. # Good start, but...
mov [rdi], rax # Indirect write to a output parameter?
mov rax, rdi # Return the address of the parameter
ret
main: # Note: simplified
mov rdi, rsp # Pass stack pointer as a parameter
call [tuple_2_ptr] # Call the function
mov edx, [rsp] # Get the values from the stack
mov esi, [rsp + 4]
call printf

Compiler Explorer: https://godbolt.org/z/hSVV72

Pros:

  • The source code is one liner, just like with the std::pair.
  • Unlike the std::pair, easy to add more values.

Cons:

  • Unfortunately, the disassembly is a mixed bag. We need to pass an address of the output tuple to the function, one per tuple.
  • Even for two integers (64 bits), the return values are always on the stack. Slow :(

What if we return more values in the tuple? Adding more values does not change the disassembly much: we still pass just one address pointing on the stack, then we put the values under that address (on stack), and then we load them back from the stack to use for printf().

It’s slower than the pair and the structure, which both return up to 128 bits in the registers. But it’s faster than the output parameters, where we need to pass to the function few addresses, not just one.

Takeaways

  1. The fastest methods to return multiple parameters in C++17 are by using local structure and std::pair.
  2. The std::pair must be preferred to return two values as the most convenient and fast method.
  3. Use output parameters when the function overload is needed. That’s why std::from_chars() uses output parameters and a return structure.

Full source code: https://github.com/berestovskyy/applied-cpp

The Answer to “Do I Know This Already?” Quiz

The std::pair is the most convenient and fast method to return two values. If we need to return more than two values, local structure (faster) or std::tuple (convenient) must be used instead.

I’m in love with software performance, computer networks, and a neat design. Sounds familiar? Let’s stay in touch at http://linkedin.com/in/berestovskyy/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store