Factorial Function in Assembly

7 min readFeb 1, 2020

Lately, I’ve been reading the book Programming from the Ground Up by Jonathan Barlett. It teaches x86 assembly language programming from the very grounding blocks. Applying the concepts learned in the book, I’ll show you how to write a factorial function in x86_64 Assembly.

There are many assembly languages: Intel, AT&T, NASM, etc,… In this tutorial, I’m going to use the AT&T syntax, which can be easily assembled in any Linux machine through the GNU Assembler (GAS).

Let’s start by creating a file named main.s (’s’ is the standard extension for assembly programs).

Every assembly program is composed of three sections: data, bss, and text. The data section is used to initialize constants. Those constants are preallocated during the program's initialization. The bss section is used to declare buffers or dynamically allocated data. Finally, the text section is used to keep the actual code. None of them are mandatory, but it’s a good practice to define them explicitly.

The “.” indicates that the instruction is a directive to the assembler (just like # in C is a directive to the compiler).

In order to execute your program, the operating system needs to know where’s the start of your program (like a “main” function). In Assembly, this is accomplished through the _start label.

A label is just a placeholder for a memory position. On Von Neumann architecture (which is the architecture used by most of the modern computers), code is actually kept on memory along with the data that it manipulates. For the processor, there’s absolutely no distinction between code and data. In the above example, the label would exercise the function of pointing to a ‘code line’ (This is an oversimplification).

You must have noticed the use of globl directive. It’s used to make a symbol visible to the linker. We’ll talk about it later, just be informed that it’s mandatory to the _start label.

Alright! Now let’s make our program just exit in order to obtain a basic executable program.

The rax and rbx are both general-purpose registers. The int instruction is an interruption. When we use the 0x80 code, we are saying that this interruption must be handled by the Linux operating system. In another words, we are performing a system call. The code of the system call is stored in the rax register. The 1 is the code for the exit system call. The exit system call takes one parameter, the return value, which is stored in the rbx register. In this case, we are returning the value 0. By the way, the $ is used to indicate constant values. If we omit it, it would be interpreted as a memory address instead.

In C, the code we just typed would be equivalent to:

You can turn it into an executable code by calling:

But a program like that is of no use, right? So let’s create our factorial function. Functions in Assembly are a kind of headache because you need to explicitly manipulate the stack.

THE STACK

The stack is a contiguous region of memory reserved for the program by the operating system.

Source: https://stackoverflow.com/questions/32418750/stack-and-heap-locations-in-ram

There’s a special register called rsp (stack pointer) that points to the ‘top’ of the stack. So how do we store things in the stack? By simply using mov instruction and manipulating the stack pointer. For example, let’s suppose we want to store two values in the stack, 1 and 2. It could be accomplished that way:

A stack is a structure that grows upward, in the sense that it first begins with addresses of higher values and grows towards addresses of lower values. In the above example, we are using 8 bytes of the stack to store the value 1 and equally 8 bytes to store the value 2, so we need to step down the stack pointer by a value of 16 (we use 8 bytes because on x64 architecture, the registers have 8 bytes. 8 x 8 = 64 bits).

Since manipulating the stack pointer directly is boring, there are two special instructions to accomplish the same effect: push and pop. push subtracts the stack pointer by 8 and stores the parameter into the current address pointed by the stack pointer. pop moves the data stored in the address currently pointed by the stack pointer to a register and adds the stack pointer by 8.

RECURSIVE FACTORIAL FUNCTION

Create a file named factorial.s. Insert the following code:

We use the globl directive because we want the factorial label to be visible to other programs (we want to use the function in other programs). Here’s the first trick: “Calling” a function is simply jumping to the memory address where this function is defined.

The new thing here is the type directive. Don’t worry, it’s just a directive to indicate that the label is not a common label but actually a function.

Now let’s go back to our main program and call this function:

This may seem complicated at first glance, but it’s rather simple. We first store the value 5 in the stack. Since we cannot pass arguments directly while calling the function like in other languages, we need to store the arguments in the stack. In this case, 5 will be first the argument for the factorial function: The number we want to calculate the factorial.

After that, we use the call instruction. What the call instruction actually does is just storing the current address into the stack pointer (because we need to know where to return after the function has been finished!) and jumping to the factorial label.

Finally, we increase the stack pointer (because we no longer need the function parameters stored in the stack, we can safely override them to prevent memory leaks) and move the function return (the function return is, by convention, always stored in the rax register) to the rbx so we can display it after the program has been executed (through the echo $? command).

Our main program is finished. It’s roughly equivalent to the following C program:

Now let’s focus on our factorial function. The very first thing we need to do in a function is to store the rbp register. rbp is the base pointer register. It points to the beginning of the current stack frame. Stack frame is the region of the stack where we stored our function parameters and function return address. Since we can call functions inside functions, we need to store the ‘context’ of the previous function call (if any).

We move the current stack pointer to the rbp register. Now we are within the context of the current function. Let’s retrieve the arguments:

Remember: We use rbp + 16 because we pushed the parameter value into the stack first (the second pushed value was the function return address, therefore it stands on top).

Since we are implementing the recursive version, we first need to define the recursion base: If the parameter value is less or equal than 1, return 1.

Wow, a lot of things happened here! Let’s calmly analyze each one of them. First, we compare the parameter value to one (through the cmp instruction). Then, we check if the parameter value is equal to one. If so, we jump to the factorial_base label. This conditional jump is accomplished by the je instruction (jump on equal). The cmp instruction sets a flag in theflags register, which the conditional jump will look up to decide if it will jump or not.

Once within the factorial_base label, we move the value 1 into the rax register. The rax will store the final value of our calculation. The program flow will then move automatically to the label right below, factorial_end.

The factorial_end label will restore the stack pointer to where it was when the function was called (in case we have manipulated it inside the function body, which we didn’t, but it’s a good practice to keep our code generic) and will then restore the base pointer register. Finally, there’s this ret instruction, which will return to the value currently pointed by the stack pointer (the function return address).

Now let’s implement the recursive calls:

After we checked our recursion base condition (and failed), we decremented the value of the parameter in order to pass it as an argument to the same function again (through the decl instruction). The way of calling the same function is identical to the code we wrote on our main program: We push the arguments into the stack and call the call instruction.

Once the recursive function has been finished, its return value is stored in the rax register. Before multiplying the current parameter by the return value of the recursive call, we first need to restore its original value (remember: When we made the recursive call, it was overridden). We accomplish it by calling: mov 16(%rbp), %rbx. We then multiply the value of rbx by rax and store the result in rax through the imul instruction. After it, we jump to the end of our function (factorial_end).

That’s it! Not that hard, right? Take some time to ensure you have understood every piece of our code.

The above code is roughly equivalent to the following C code:

hehe, Assembly is such a pain. :)

GENERATING EXECUTABLE CODE

Now let’s generate our executable code. First, let’s generate object code from our both assembly programs:

Let’s link our two object codes into a single executable code:

Now run:

We can check our program return by calling:

A value of 120 must have prompted, that is the factorial of 5.

It’s worth noting that if we omit the .globl factorial in factorial.s, the linker would prompt an error, since it cannot resolve the symbol.

ITERATIVE FACTORIAL FUNCTION

The factorial function can also be implemented in a single loop. Observe the following C code:

Since the program is bigger in C than the previous version, you may think that it would be even more complex in Assembly than the previous Assembly program. However, it’s actually simpler:

We copy the parameter value to the rbx register. It will exercise the same function of the i variable in the C code. We then decrease the value of i and then jump to the loop.

On our loop, we compare the value of rbx (i) to one. If it’s equal, we exit our loop. Otherwise, we multiply rbx by rax and store the result in rax. In the end, we return to the beginning of our loop.

CONCLUSION

Well, well… We learned important Assembly concepts in this tutorial: System calls, the stack, functions… It’s a lot! You already are able to implement some other simple algorithms (try Fibonacci!).