A Primer on x86 Assembley
Note: This is more of just my notes on x86 Assembley language than it is an actual blog post.
If you want to find something specific, like how to compare things in assembley do crtl+f “compare”.
Binary is completely unusable but Assembley Language is usable and very close to binary.
Each line of assembley code is one operation.
Operation codes are called mneomics.
Registers have names that indiviudally identify them.
Addresses are specified using labels.
Example:
Adjust: mov eax num1 ; get first number
adjust is the label
mov is the opcode
eax is the register
num1 is a label
anything after “;” is a comment.
This can be translated to binary usign an assembler. _asm.
Registers
EAX — Accumulator Register
Register for general purpose data storage. On an x86 CPU it looks like this:
EAX AX AH AL31 15 8 7 9
Examples
mov eax, 42 ; put 42 into eaxmov ax, count ; gets 16 bit variablemov al, 'x' ; put ascii value of x in low biteinc eax ; increment eax
A simple assignment statement like
num = count1 + count2 - 10
Looks like this in assembley:
mov eax, count1add eax, count2sub eax, 10mov num, eax
EBX — Base register
ECX — Counter register
Often used for loops (seen later)
It can be used with sepcial jump instructions
JECXZ ; jump if ecx is zeroJCXZ ; jump if cx is zero
EDX — Data register
Flags Register
Allows us to query the effect of the previous instruction. The status of an operation is stored in the Flags register. The flags register ocntains these flags:
S: sign (indicates whether result is +ve (positive) or -ve (negative))Z: zero (indicates if result is zero or not)C: carry (indicates an arhimetic carry)O: overflow (arhimetic overflow error)
The flags register can be used in conjunction with jump instructions to control program flow. So if flag O then jmp here etc
ESP Register
The simplest jump instruction is the unconditional jump.
It jumps no matter what, as soon as it is reached in the instruction pointer.
It has the syntax
JMP <address of the target instruction>
The address of the target instruction is normally a label
Conditional Jump
Just like an if statement.
Jumps that test flags:
Something cool to note is that every instruction has an inverse and the inverse has “N” in the middle of it, probably meaning “Not”.
Example of using a jmp
mov eax, num ; moves contents of num into eaxsub eax, 10jnz store ; if number is not a zero then jump to store, otherwise run thismov eax, 100store: mov num, eax
CMP is the most common way of comparing two values.
if eax and ebx contain the same number then cmp eax, ebx will set the Z flag.
Loop
Loops in assembley are simple
While loop
while1: blah blah blahend_while:
Do-while loop
do-while: blahend_while:
A for loop can be made in assembley. Take this example
for (int x = 1; x <= 10; x++){ y = y + x;}
First Attempt
mov eax, 1 ; using eax as the variable xfloop: ; start of for loop add y, eax ; update y inc eax ; x++ cmp eax, 10jle floop ; counts up to 11, jump back to floop if cmp eax, 10 results in the less than flag
We can improve this by counting in reverse:
mov eax, 10floop: add y, eax dec eac ; x--jnz floop ; go to floop if previous operation does not result in 0
We can use ECX to improve the previous loop like so:
mov ecx, 10floop: add y, ecx loop floop
Addresses and Values
In assembley we can get the address of a variable with the LEA (load effective address) instruction
We often use EBX
LEA EBX, val
We can access the value pointed to by the address using register indirect addressing mode
mov eax, [ebx]
the [] is register indirect addressing mode.
Subroutines (Functions)
Once a subroutine goes to a place in code, how does it know where to return?
It stores the return address into the instruction pointer register which always points at the next instruction.
So let’s say you have the code
100
101
102
and 101 points to a memory location which is a subroutine. The subroutine is 5 lines long, so the code changes to
100
201
202
203
204
205
206
102
where 20-something is the address of each instruction in the sub routine.
A subroutine in assembley is programmed as
label PROC BLAH BLAH BLAH label ENDP
the procedure is called by:
call label
You can use C functions inside assembley
The call instruction records the current value of EIP (instruction pointer) as the return address
Puts the require subroutine address into EIP so the next instruction to be executed is the first instruction of the subroutine.
The RET instruction (return) retrives the stored return address and puts it back into the EIP, causing execution to return to the instruction after the CALL.
The Stack
A stack is a memory arrangement (data structure) for storing and retrieving information (values)
the order of storing values from the stack can be described as LIFO
Stacks are incredibly useful almost every assembley language has special instructions for implementing a stack
in the x86 assembley language there are PUSH and POP instructions
Push and POP operations make use of the stack pointer register ESP which holds the address of the item which is currently on top of the stack
Recall that in x86 architectur, the stack grows down in memory.
PushThe PUSH instruction:
- decrements the address in ESP so that it points to a free space on the stack
- writes an item to the memory location pointed to by the ESP
ESP stands for extended stack pointer.
Pop
The POP instruction:
- fetches the item addresssed by the ESP
- Increments the ESP by the correct amount to removethe item from the stack
Modifying the stack
Items can be removed rom the stack or space reserved on top of the stack by directly altering the stack pointer: ADD ESP, 8 ; take 8 bytes off the stack SUB ESP, 256 ; Create 256 bytes on stack
ESP always puts it to the top of the stack.
The stack grows downwards so if we have a stack like
And we add an item, X, like so
Parameters
The simplest kind of subroutines perform an identical function each time it runs.
Value Parameters
The information you give to a subroutine is simply a value.
Reference Parameters
Consider another subroutine: “given two variables, exchange (swap) their values”. The situation is different here, having only the values of the variables is not enough.
In calling the subroutine we will need to tell it the addresses of the variables.
Such parameters are called reference paraemters.
What you need is not the content but an address, a reference, where it is. Hence the term “pass by reference”.
Calling External Functions
We can call functions, especially C functions, in assembley. We can call a function using the call command like so:
call printf
When we call printf it can and will delete and overwrite registers. Because of this we need to store our register data somewhere. We can store this data in a stack. We store the data like so:
mov ecx, 10 ; sets up loop counter
loop1:
push ecx ; save the loop counter on stack
lea eax, msg ; saves the address of message into eax
push eax ; put the paraemter ontop top of stack
call printf ; calls C function which prints first thing on stack, can mess up register data
pop eax ; remove paraemter
pop ecx ; restores saved loop counter
loop loop1 ; goes back to top of loop
Calling Formatted Printf’s
We can insert data into a printf statement like so:
printf("Number is %d\n", n);
If we want to do this in assembley, we need to push it in reverse order. So first we push:
n
and then we push the string
"Number is..."
This is how the stack works, items added always go to the top of the stack.
#include <stdio.h>#include <stdlib.h>int main (void){char msg[] = "Number is %d\n";int n = 157;_asm {push n ; push the int firstlea eax, msgpush eax ; now stack the stringcall printfadd esp, 8 ; clean 8 bytes from stack}return 0;}
To call Scanf we need to give it 2 paraemters, format string and num. Scanf reads info from the terminal.
char fmt = "%d"; int num;_asm {lea eax, num ; we need to push the address of num into eaxpush eaxlea eax, fmt ; now the format stringpush eaxcall scanfadd esp, 8 ; clean stack}
We need to pass the address of something and not the value.
Clean stack means take stuff of that you put on. Always try to restore stack to the state you found it in. It’s 8 bytes in this example because each variable is 4 bytes and we’ve pushed 2 things, which is 2 * 8 = 16.
If you liked this article, connect with me!
LinkedIn | Twitter | Newsletter