Tail Call Optimization in C++

Eliminating the last function call and converting a recursive function into a loop based function

EventHelix
Software Design
5 min readDec 23, 2018

--

Examining the translation of simple examples of C++ code into assembly can be very instructive in developing an intuitive understanding of the code generation and optimization process.

We will be examining the generated assembly for simple code fragments that have been compiled with the GCC trunk (post 8.2). The optimization level switches have been set to O3.

Mapping C++ code to assembly

Consider the run function defined below. The function takes a single parameter, logLevel. The code shows two trace puts calls controlled by the logLevel. The actual application code is just represented as a puts call.

We have compiled the code into the assembly using the Compiler Explorer. The C++ code and the corresponding assembly is color-coded, enabling you to easily track the assembly generated for a particular line of C++ code.

If-condition optimization and tail call elimination example

Let’s review the generated code under two scenarios:

  • logLevel is 0
  • logLevel is 1

logLevel is 0

The first thing you will notice is that the compiler has replaced the two if conditions on (C++ lines 9 and 16) with a check (Assembly lines 8 and 9).

  • test edi, edi: Check the logLevel (edi register)
  • jg .L7: Jump to .L7 label if edi is greater than 0. Since logLevel (edi) is 0, this jump will not be executed.

The processor will execute assembly lines 10 and 11. These lines correspond to C++ line 14.

  • mov edi, OFFSET FLAT:.LC1: Copy a pointer to the string defined at the .LC1 label (assembly lines 3 and 4) into the edi register.
  • jmp puts: This is somewhat unexpected. One would expect that the puts function will be called here using a call statement (the call statement pushes the return address on the stack). However, what we see is a jump statement that does not prepare the stack frame for the call. Jump does not push the return address on the stack. This is an example of tail call optimization.

Tail Call Optimization (TCO)
Replacing a call with a jump instruction is referred to as a Tail Call Optimization (TCO). Here the compiler is optimizing away the last function (tail function) stack preparation. Since no call was made, the stack still contains the return address of the caller of the run function.

  1. When code performs a jump the puts function reads the string pointer in the edi register and goes on to print the string on the screen.
  2. When the puts function is done, it invokes a return statement that obtains the return address from the stack. The return address on the stack, however, corresponds to the caller of the run function.
  3. Thus, the return statement in the puts function actually returns from the run function!

The compiler has fooled the puts function into thinking that is returning back to the caller. The puts function however has returned to the caller of the caller! This is the reason why you do not see a return instruction in the run function. It is hijacking the return instruction of puts!

logLevel is 1

As we noted earlier, the compiler has replaced the two if conditions on (C++ lines 9 and 16) with a check (Assembly lines 8 and 9).

  • test edi, edi: Check the logLevel (edi register)
  • jg .L7: Jump to .L7 label if edi is greater than 0. Since logLevel (edi) is 1, this jump will be executed.

Assembly lines 13 and 19 show stack operations to allocate and free 8 bytes on the stack. Note that these instructions were not needed in the logLevel = 0 case as no function calls were made from run.

Assembly line 14 to 17 show the code for printing "Trace Message1\n" and "My code fragment goes here\n" strings. Note here is that the compiler generated code for printing this string twice. Assembly lines 10 and 11 were used to print the message when logLevel was 0.

The assembly lines 18 and 20 print the "Trace message2\n". Note again that the compiler has again employed the tail call optimization trick to save on a return.

Summary

Here is the generated assembly code again, this time annotated with comments explaining the rationale of the code.

Example of if-statement handling and tail call optimization handling

Optimization for recursive calls

We learned in the previous example that the compiler optimizes the last call to a function. How does the compiler handle the case when the last call is a recursive call to the function itself?

Let’s look at a simple implementation of factorial that performs a tail call on itself.

When operating on the post 8.2 GCC trunk, we see that the compiler completely rewrites the function to a loop and eliminates recursion!

You can think of the loop code as a natural outcome of the successive application of tail call optimization for a recursive function call.

Compiler Explorer mapping from C++ to the assembly is presented below.

Tail call optimization in a recursive function

Here is the annotated assembly code for the tail call optimized factorial function.

Tail call optimization also plays a central role in functional programming languages. Recursive function definitions in functional languages are converted into loops with tail call optimization.

Learn more

--

--