Understanding assembly code is definitely a challenge, but it can be easily understood if you spend enough time looking through documentation and communicating with your peers. In this blog I am going to attempt to explain a small part of assembly code, so that you don’t have to spend too much time looking for documentation.
As a student at Holberton School I was given the following task:
- Add one line to this code, so that the program prints 98, followed by a new line.
As you can see I had some limitations. I couldn’t use the variable ‘a’, modify the value of the pointer to an integer ‘p’ , nor have more than one statement on that one line.
I was given the hint to look into the ‘objdump’ program. If you are not familiar with objdump, wikipedia describes it as: “ Objdump is a program for displaying information about object files, sometimes used as a disassembler to view an executable in assembly form”. Which in this case is exactly what I needed to do: disassemble the the executable to see the relationship between the memory addresses, the variables and the values contained inside them.
The first step that I took was to go to my terminal and create a file that contains the provided C code. In this case I named my file ‘ 0-magic.c’. I also need to compile this source code. By doing this the computer creates an executable file that contains machine language. I used the gcc compiler the following way.
After this compilation the computer creates a default executable output named “a.out”; this is the file I am going to use with objdump to see the assembly code that was produced.
The way in which I used objdump was the following:
We can break this command down into a couple things. Objdump has different options that can be used with executables. In this case I am only interested in the options ‘-d’, which displays the assembler mnemonics for the machine instructions of our file and the option ‘-M’ followed by ‘intel’ which selects the syntax in which the assembly language will be displayed. These are followed by the name of my executable file. In this case I am only interested to see what is happening inside the main function, so I used the pipe ‘ | ’ and the grep command to select the text that matches main and the following 19 lines. As a result I get:
Now, this output might look a bit challenging, that is at least what I thought the first time I saw something like this, but believe me once you dive into it, it becomes more clear.
The columns that we really need to consider are the third and the fourth one starting from the left. They contain the mnemonic source and destination.
Mnemonic source, There are a lot more instructions of this type that won’t be covered in this blog post, but the ones that we have here are:
- Push : Push data onto stack
- Mov: Move-copy to/from special registers. Used for transfer operations
- Sub: Subtraction. Used for arithmetic operations
- Lea: Load effective address. (Used to allow reference to code both before and after instruction)
- Add: Addition. Used for arithmetic operations
- Call: Call procedure, call subroutine. Used in jump(general) operations
- Leave : Leave Stack Frame. ( Releases the local stack storage)
- Ret : Return from subroutine. Used in jump(general) operations
- Nop: No operation
From top to bottom here is what is happening:
- The computer pushes the rbp (base pointer) to the stack.
- Copies the value of rbp register into rsp (stack pointer) register. At this point rbp and rsp are at the same memory location. For the a better understanding of this exercise we are going to assign the value 60 to rbp. This is an arbitrary number that is only used to see the relationship between memory addresses.
- In the next line we have → sub rsp, 0x30. This means that 0x30 (48 in decimal) is subtracted from the value of rsp. So the arithmetic operation is the following one:
- 60–48 = 12
(after step 3 we have something like this in memory)
This leave us with 48 memory spaces available for our program.
- In the next instruction the value 0x400 (1024 in decimal) is copied into [rbp-0x18] (which is a memory location). In this case 0x18 is in decimal 24, and remember that rbp is standing at 60. The arithmetic works like this → 60–24 = 36.
- By looking at our C program we see that the value assigned to a is 1024, so we can infer that the memory location of a is 36. Since we know that we are working with an array of ints, we can check how many bytes an int takes in our machines by writing inside our C code → printf(“%d”sizeof(int)), then compiling it and executing it to see the size of an int. In the machine that was used for this exercise an int takes 4 bytes. So our memory allocation looks like this now:
Since we also know that this array has five elements we can interpret the memory that they are taking too.
Notice that each element of the array is taking 4 bytes.
- Next we have the statement [rbp-0x2c] translated into decimal is 60–44 = 16. At this point just by looking at the assembly we can not tell which variable is going to be allocated at 16. We can infer by looking at the C code that is the variable n because it comes before the declaration of the pointer to an integer p.
- If we look at the following line, it states [rbp- 0x28], which is the same as 60–40 = 20. We can see that from 16 to 20 there are only 4 bytes, and since in the machine in which this assembly was produced the size of a pointer is 8 bytes, we can infer that the correct address of p is 20. At this point memory looks like this:
Notice that in the C code, p contains the address of n which in this case is 16.
Now that we have seen the relationship between the memory addresses and how they work we can modify write → *(p + 5) = 98; Inside our C program. This line will move the value of the address that p is pointing to (in this case the address of n) five memory locations leaving us at the address of a.
If you are wondering how does 16 + 5 = 36, it’s because we are actually counting by the value of the int. So the arithmetic is more like 5 *4 + 16 = 36. And that is where we store the value 98.