Introduction to the x86 architecture

Starting out in Reverse Engineering

Gaurav yadav
RESETHACKER
9 min readMay 26, 2020

--

Image of a cpu

About Me

Hi! My name is Gaurav I am a Developer ,Reverse engineer .I like to learn how things works at the ground level. I am not a prodigy and not an expert ,just a simple guy with his interest who somehow wants to make a difference. I have a bad habit of procrastination to overcome this habit i challenged myself to write blogs about different things i am interested in ,so this is my first blog if you like it please show your love by giving claps to the post.

Let’s Begin

One of the Main aspect of reverse engineering and exploit development is to understand how low level languages interact with the processor and memory i will focus on some of these topics in this blog. Slight knowledge of assembly language will be very helpful in understanding the topics that are going to come next. Every architecture of processor have there syntax of assembly language i will be focusing on x86 architecture and intel syntax .

My friend Registers

Structure of a 32bit register

Registers are the high performance memory used for calculations and data manipulation ,almost every instruction in assembly language uses these registers to read or write data . Every processor have there own sets of registers which can be used as variables like in any other programming languages. The x86 processor have many type of registers but i will be focusing on commonly used ones. 32 bit register can also be used as 16bit and 8bit register for example if register EAX is being used as a 16 bit register the name used for EAX(E stands for extended) is AX and if it being used as 8bit register then you will encounter it as AH(H stands for high) or AL(L stands for low) .Therefore a 32bit program can execute on a 64bit processor but reverse is not possible. We will focus on 2 types of registers which are given below.

General Purpose Registers

As the name suggest these registers are being used very frequently ,as operands in general instructions. Registers which fall in this category are :

  1. EAX :- This register is known as accumulator it is mainly used for logical operations like addition ,xor etc and to store returned values .
  2. EBX:- This register is known as base register and used in indexed adressing . You can read about different addressing mode here.
  3. ECX :- This register is known as counter ,as the name suggest this register is mainly used to keep track of a counter for example when a loop executes to decide how many number of time the loop will be executed counter is incremented every time a loop executes and is compared with the condition when to stop the loop execution.
  4. EDX:- This register is known as data register. It is used with the EAX register for multiplication or division which involves large operands.
  5. ESP:- This register is known as stack pointer ,it is called pointer because it point to the data stored in memory by storing 32-bit address of that data . It points to the last value of the stack.
  6. EBP:- This register is known as base pointer which is used to reference arguments and local variables.
  7. ESI/EDI:- These are known as source index and destination index register which stores the source and destination when data needs to be read from or written to.

Don’t worry if you are having hard time understanding registers I will explain these registers with the help of examples later.

Special Use Registers

These registers are not used for general instructions rather they have special purposes. Registers in this category are:

  1. EIP : It is known as instruction pointer ,you might wonder what makes it special. Now this is where things become little interesting this pointer(register) points to the instruction which is going to be executed next, like a child pointing to alphabets and reading them one by one. Try to think it in this way what if we somehow controls this register and tells it what we want to execute rather what it was going to execute then we can do whole lot of fun.
  2. EFLAGS: These register actually consist of several bit flag which are used for checking conditions similar to the conditions used in any loop or if else statement. I will not be able to explain each and every flags but you can read it on your own here .

Deep Dive

I will be explaining important registers with the help of examples and we will be looking what is stack and how it works in another blog.

C code and its compiled assembly

In the above image left side is the C code and right side is the compiled assembly of left side C code . If you will observe closely on the right side there are mainly two columns first column is the address which depicts memory location and second column is the instruction which depicts instructions that we have written in C code which i compiled in the x86 architecture assembly. Remember EIP?, yes the register which points to the instruction that is going to be executed next let’s talk about that first . Can you find value of EIP on the right side(psss.. you might want to look in the below half on the assembly side where all the values of registers are printed.) .You can observe that value of EIP is 0x56556199 and i have already highlighted the instruction at that position in the assembly , can you guess what this instruction is going to do?. If you know what mov instruction do ,you might have guessed it . This instruction is assigning value 2 to something if you see C code we are assigning 2 to a variable named ‘counter’ ,the assembly instruction is also doing the same but instead of using name of a variable ,assembly uses corresponding address to store these values. You might ask How I am so sure that the 2 is being assigned to variable ‘counter’ in assembly and not to any other address. The answer is very simple,we do not have any other value 2 assignment in our C code ;) and the other reason which you should know that whenever EBP register is being subtracted the chances are that local variable is being accessed. In our case we have only one local variable which is ‘counter’. After executing this instruction the EIP will point to the next instruction below and so on this is how EIP tells what instruction to execute next. We will definitely try manipulating the content of the EIP to execute instruction of our choice in some other blog so please follow me ;).

Let us now talk about the accumulator(EAX) ,as i have already explained this register is used for doing logical operations like addition ,subtraction ,xor etc. If you look into the assembly below you will see EAX register is being used for addition and for storing the result of that addition.

Showing uses of registers

And during explanation of EDX register i had told you that sometimes EDX is also used with the EAX for logical operations. If you observe the assembly from 0x565561a5 to 0x565561a9 you can see that some value is added to EAX two times and EAX already contains some value .Here EAX and EDX both contains 2 so the end result of EAX after this loop is 2+2+2 = 2*3. This is what exactly we were doing in C code if you can remember ,we were multiplying 3 (line 7 in C code)to the variable ‘counter’. But you might be asking why compiler is adding instead of multiplying ,this is a perfect example of compiler optimization during compilation addition takes less time than multiplying when values are small therefore compiler changed multiplication to addition . Let me tell you one more compiler optimization which led to a security flaw. Once a memory forensic investigator found a lot important credentials hanging around in the memory because a program used that credentials in a program and programmer overwrite that memory where credentials was present with zero to ensure that credentials should not be present in memory but the credentials remained on the memory because compiler optimizes the instruction thinking writing zero is similar to writing nothing so it save the time but exposed the credentials.

Now lets talk about last register in this blog which are flag register(EFLAG) . In above image if you see carefully in assembly at address 0x565561ae there is an instruction cmp which compares two operand and set the flag register accordingly ,here we are comparing the condition which we are using in for loop in C code [ebp-0x4] is a local variable ‘counter’ and 0x13 is the hexadecimal representation of decimal value 19 ,hmm… but we are comparing ‘counter’ with 20 why compiler is comparing it with 19 ? Answer of this question is in next instruction. Let us first look what cmp instruction does. cmp instruction is same as subtraction but instead of storing the result it updates the flag register. If you don’t know about the flag register i have mentioned a link above during explanation of EFLAGS please read that and come back . Our next instruction after ‘cmp’ is ‘jle’ (jump if lower or equal to)which checks the flag registers updated by ‘cmp’ instruction. Here ‘jle’will check ZERO FLAG and SIGN FLAG because during ‘cmp’ if the value is equal to 19(0x13) then the result will be zero which results is setting the ZERO FLAG to 1 and if the value is smaller than 19(0x13) then the result of subtraction will be negative which will set the SIGN FLAG to 1 so ‘jle’ will make EIP jump to the starting of for loop(next iteration at 0x565561a2) if ZERO FLAG(if ‘counter’ is equal to 19(0x13)) or SIGN FLAG (if ‘counter’ is less than 19(0x13))is set . If none of these FLAG is set the EIP will move towards the end instead of going into other iteration of the for loop.

2nd Iteration of for loop

If you check in the above assembly i have printed value of edx which is the value of ‘counter’ variable till now it is smaller than 19(0x13) and before execution of ‘cmp’ instruction it will become 18(6*3) so after execution of ‘cmp’ instruction SIGN FLAG should be set. Let us see

In the above image we can see that all the flags which are set are printed in ‘[]’ one of them is SF (SIGN FLAG). So ‘jle’ will check that SF is set and EIP will take the jump to destination.

Last Iteration of for loop

Here you can see that value of EAX is 54(0x34) which is the current value of ‘counter’ variable if this value will be compared with 19(0x13) SIGN FLAG will not set this time because 54 is greater than 19 and result of ‘cmp’ will not be negative, lets check the value of flag register after the comparison is over.

In the above image you can see only IF(interrupt flag) is set. So this time the jump will not be taken and the program will end.

New Beginning

If you are with me till now I hope you understood the working of registers if not than give it another try you will surely get it . Don’t worry I do not forgot about the ESP and EBP sorry but you will have to wait for another blog for that because for that i will have to explain the functionality of stack that will take up an another blog so keep following. For further reading i will suggest you read Beginner_RE and try debugging your own C code using gdb or any other debugger you like. These topics are very important step for every reverse engineer to understand This is just a beginning but a very important first steps toward knowledge. Huge shout-out to my big brother Abhinav Thakur for motivating me and solving my doubts.

Say hi to me on Linkedin .

See you in the next blog which will be coming very soon ;).

--

--

Gaurav yadav
RESETHACKER

I like to learn things which challenges me . I am a Developer ,reverse engineer and very much addicted to games.