Tail Call Optimization in C++
Eliminating the last function call and converting a recursive function into a loop based function
Examining the translation of simple examples of C++ code into assembly can be very instructive in developing an intuitive understanding of the code generation and optimization process.
We will be examining the generated assembly for simple code fragments that have been compiled with the GCC trunk (post 8.2). The optimization level switches have been set to O3.
Mapping C++ code to assembly
Consider the run
function defined below. The function takes a single parameter, logLevel
. The code shows two trace puts calls controlled by the logLevel. The actual application code is just represented as a puts call.
We have compiled the code into the assembly using the Compiler Explorer. The C++ code and the corresponding assembly is color-coded, enabling you to easily track the assembly generated for a particular line of C++ code.
Let’s review the generated code under two scenarios:
- logLevel is 0
- logLevel is 1
logLevel is 0
The first thing you will notice is that the compiler has replaced the two if conditions on (C++ lines 9 and 16) with a check (Assembly lines 8 and 9).
test edi, edi
: Check the logLevel (edi register)jg .L7
: Jump to .L7 label if edi is greater than 0. Since logLevel (edi) is 0, this jump will not be executed.
The processor will execute assembly lines 10 and 11. These lines correspond to C++ line 14.
mov edi, OFFSET FLAT:.LC1
: Copy a pointer to the string defined at the .LC1 label (assembly lines 3 and 4) into the edi register.jmp puts
: This is somewhat unexpected. One would expect that the puts function will be called here using a call statement (the call statement pushes the return address on the stack). However, what we see is a jump statement that does not prepare the stack frame for the call. Jump does not push the return address on the stack. This is an example of tail call optimization.
Tail Call Optimization (TCO)
Replacing a call with a jump instruction is referred to as a Tail Call Optimization (TCO). Here the compiler is optimizing away the last function (tail function) stack preparation. Since no call was made, the stack still contains the return address of the caller of the run
function.
- When code performs a jump the
puts
function reads the string pointer in the edi register and goes on to print the string on the screen. - When the
puts
function is done, it invokes a return statement that obtains the return address from the stack. The return address on the stack, however, corresponds to the caller of therun
function. - Thus, the return statement in the
puts
function actually returns from therun
function!
The compiler has fooled the
puts
function into thinking that is returning back to the caller. Theputs
function however has returned to the caller of the caller! This is the reason why you do not see a return instruction in therun
function. It is hijacking the return instruction ofputs
!
logLevel is 1
As we noted earlier, the compiler has replaced the two if conditions on (C++ lines 9 and 16) with a check (Assembly lines 8 and 9).
test edi, edi
: Check the logLevel (edi register)jg .L7
: Jump to .L7 label if edi is greater than 0. Since logLevel (edi) is 1, this jump will be executed.
Assembly lines 13 and 19 show stack operations to allocate and free 8 bytes on the stack. Note that these instructions were not needed in the logLevel = 0
case as no function calls were made from run
.
Assembly line 14 to 17 show the code for printing "Trace Message1\n"
and "My code fragment goes here\n"
strings. Note here is that the compiler generated code for printing this string twice. Assembly lines 10 and 11 were used to print the message when logLevel
was 0.
The assembly lines 18 and 20 print the "Trace message2\n"
. Note again that the compiler has again employed the tail call optimization trick to save on a return.
Summary
Here is the generated assembly code again, this time annotated with comments explaining the rationale of the code.
Optimization for recursive calls
We learned in the previous example that the compiler optimizes the last call to a function. How does the compiler handle the case when the last call is a recursive call to the function itself?
Let’s look at a simple implementation of factorial that performs a tail call on itself.
When operating on the post 8.2 GCC trunk, we see that the compiler completely rewrites the function to a loop and eliminates recursion!
You can think of the loop code as a natural outcome of the successive application of tail call optimization for a recursive function call.
Compiler Explorer mapping from C++ to the assembly is presented below.
Here is the annotated assembly code for the tail call optimized factorial
function.
Tail call optimization also plays a central role in functional programming languages. Recursive function definitions in functional languages are converted into loops with tail call optimization.