Explained Difference between x86 & x64 Disassembly
I am just reading and understanding about the disassembly of x86 and x86_64 or x64. I had some confusions and questions. Now, it seems like all are cleared. So, I just want to share with you guys.
Sample C code:
void sum(int a)
printf(“Hello world: %d\n”,a);
Disassembly of x86 vs x64:
Below question is most important and was confusing for me first:
What is this instruction “and $0xfffffff0,%esp” and why do we need this?
Afterwards, when I did some research about what is this and why we need this, I found that it is stack alignment, which is required because of SSE instructions and for some optimizations like padding in C for variables, so for that, we need to align it properly.
Usually GCC, by default, aligns stack on 16-bytes (2⁴) boundary. But, you guys can also change this. Suppose you don’t need that much stack size (especially low-end systems like embedded systems), then you need to change one option in GCC:
- -mpreferred-stack-boundary=num, where num=2,3,4… like, for 16-bytes it is 2⁴ but for 4-bytes, it is 2². But, as I already told you,16-bytes alignment is necessary for SSE Instructions.
- It affects generated code in your binary. By default, GCC will arrange things so that every function, immediately upon entry, has its stack pointer aligned on 16-byte boundary (this may be important if you have local variables, and enable
- If you change the default to e.g.
-mpreferred-stack-boundary=2, then GCC will align stack pointer on 4-byte boundary. This will reduce stack requirements of your routines, but will crash if your code (or code you call) does use
sse2, so is generally not safe.
- According to System V ABI
- If CPU will access Unaligned stack then there will be a performance penalty. So, stack alignment is a must.
- 4-bytes aligned stack with fewer instructions:
For the proper understanding of Assembly, you need to understand the concept of the prologue, epilogue and instruction set.
Cheat Sheet from OSDEV Wiki
Explanation for x86_64 disassembly:
Stack Diagram for x64:
- First. Push the rbp register into the stack, so In future, it will be used to keep track of full stack frame.
- Second. Now, move stack pointer to base pointer or in other words set base pointer to stack pointer i.e., mov %rsp %rbp.
- Third. Subtract $0x10 (hex) from stack i.e subtract 16 (8 + 8 for c variable).
- Fourth. Now, here base pointer is used for reference or tracking values of registers on the basis of base pointer.
Here, movl is simple mov instruction with (l) which means long (i.e 32bits), which is used in 64 bit, meaning top bits will be zero.
- Now, it means mov $0x2 (2) value into -4(%rbp), i.e, put value 2 at address (rbp-4).
- Fifth, move that value from address rbp-4 to eax register, then, from there to eax register i.e 32bits accumulator.
- Now, callq instruction will push return address of new function to stack and then create new frame for new function and call it, i.e sum() and then exactly same steps will follow again for sum()
- Finally, revert all operations and POP every item which is pushed in the stack from both sum() then after sum(), also from main().
Calling Convention of x86_64
Note: Please, feel free to update me, if anything I forgot or missed.