A Curious Case of Tail Recursion
Recursion is not a new term for programmers. We have used it several times to solve a variety of problems like Factorial, GCD, Tree Traversal, Tower of Hanoi, etc, and can extend the same approach to solve for picture puzzles, Sudoku, and even Rubik’s cube.
We all know about the pros and cons of recursion but that was not the reason you started reading this, so let me skip to the typical case of Tail Recursion.
In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail-recursive, which is a special case of direct recursion. Tail recursion (or tail-end recursion) is particularly useful, and often easy to handle in implementations.
Tail calls can be implemented without adding a new stack frame to the call stack. Most of the frame of the current procedure is no longer needed and can be replaced by the frame of the tail call, modified as appropriate.
The program can then jump to the called subroutine. Producing such code instead of a standard call sequence is called tail call elimination or tail call optimization. Tail call elimination allows procedure calls in tail position to be implemented as efficiently as goto statements, thus allowing efficient structured programming.
Let us look at an example
The function is not tail-recursive because the recursive function call is not the last call of the function.
Let us analyze the impact of different techniques for calculating the n-th Fibonacci number on CPU and Memory usage. We are using the Golang programming language.
These are 4 candidates for calculating the n-th Fibonacci number that we have considered for our analysis.
- Iterative method.
- Recursive method without using memoization.
- Recursive method using memoization for optimization.
- Tail Recursion.
I have bundled the code used in the GitHub repository and can be accessed at https://github.com/rahulbaghel159/tail_recursion_exp
Here are the benchmarks results:-
Recursion without memoization
goos: darwin
goarch: amd64
pkg: github.com/baghelrahul159/tail_recursion_exp/benchmark/non-tail-recursion
BenchmarkFibWithRecursion1-4 443103949 2.60 ns/op
BenchmarkFibWithRecursion2-4 165883412 7.25 ns/op
BenchmarkFibWithRecursion5-4 30739672 41.1 ns/op
BenchmarkFibWithRecursion10-4 2572047 455 ns/op
BenchmarkFibWithRecursion20-4 18165 80904 ns/op
PASS
It is clear, with recursion without memoization, even the 20-th Fibonacci number takes 80904 ns/op. Thus, we will omit it from further discussions.
Recursion with Memoization
goos: darwin
goarch: amd64
pkg: github.com/baghelrahul159/tail_recursion_exp/benchmark/recursion_with_memoization
BenchmarkFibWithRecursionMemoization1-4 415333540 3.00 ns/op
BenchmarkFibWithRecursionMemoization2-4 149585394 8.03 ns/op
BenchmarkFibWithRecursionMemoization5-4 100000000 10.8 ns/op
BenchmarkFibWithRecursionMemoization10-4 49885395 25.9 ns/op
BenchmarkFibWithRecursionMemoization20-4 66632116 20.2 ns/op
BenchmarkFibWithRecursionMemoization50-4 45081136 27.7 ns/op
BenchmarkFibWithRecursionMemoization100-4 41284546 33.4 ns/op
BenchmarkFibWithRecursionMemoization300-4 45022812 28.7 ns/op
BenchmarkFibWithRecursionMemoization500-4 75619054 19.7 ns/op
BenchmarkFibWithRecursionMemoization700-4 69009540 21.3 ns/op
BenchmarkFibWithRecursionMemoization1000-4 80567641 16.3 ns/op
PASS
Tail Recursion
goos: darwin
goarch: amd64
pkg: github.com/baghelrahul159/tail_recursion_exp/benchmark/tail_recursion
BenchmarkTailFib1-4 373255960 4.33 ns/op
BenchmarkTailFib2-4 157686687 7.42 ns/op
BenchmarkTailFib5-4 73597358 24.2 ns/op
BenchmarkTailFib10-4 16147897 69.8 ns/op
BenchmarkTailFib20-4 13154060 126 ns/op
BenchmarkTailFib50-4 5215443 270 ns/op
BenchmarkTailFib100-4 2578944 477 ns/op
BenchmarkTailFib300-4 965191 1368 ns/op
BenchmarkTailFib500-4 438627 3005 ns/op
BenchmarkTailFib700-4 339271 4191 ns/op
BenchmarkTailFib1000-4 263703 5076 ns/op
PASS
Iterative
goos: darwin
goarch: amd64
pkg: github.com/baghelrahul159/tail_recursion_exp/benchmark/without_recursion
BenchmarkFib1-4 441498200 2.52 ns/op
BenchmarkFib2-4 460432479 2.40 ns/op
BenchmarkFib5-4 231182103 6.86 ns/op
BenchmarkFib10-4 118818930 12.0 ns/op
BenchmarkFib20-4 58351034 20.1 ns/op
BenchmarkFib50-4 23743794 59.4 ns/op
BenchmarkFib100-4 17107134 68.1 ns/op
BenchmarkFib300-4 6996146 194 ns/op
BenchmarkFib500-4 4176632 359 ns/op
BenchmarkFib700-4 2957839 398 ns/op
BenchmarkFib1000-4 2071720 1038 ns/op
BenchmarkFib2000-4 1000000 1153 ns/op
BenchmarkFib3000-4 611624 3048 ns/op
PASS
Plotting the above values using matplotlib library from python.
Flame graphs for Heap profile
Recursion without memoization
Recursion with Memoization
Tail Recursion
Iterative
Apart from memoization technique which uses almost 30 MB, rest all methods are using few KB
Flame graphs for CPU profile
Recursion without memoization
Recursion with Memoization
Tail Recursion
Iterative
Without memoization, recursion takes almost 18s which is reduced to 160ms with memoization, tail recursion and iterative function takes the almost same amount of CPU time.
References:
https://en.wikipedia.org/wiki/Recursion_(computer_science)
https://en.wikipedia.org/wiki/Tail_call
https://medium.com/@openmohan/profiling-in-golang-3e51c68eb6a8