How To Reverse Engineer Executable Files

0xwan
7 min readApr 25, 2022

--

Hi readers! In this blog I will show you step by step on how to reverse engineer an executable file written in C. Reverse engineering can be overwhelming for beginners but I will try my best to explain and share tips on how you can easily understand the flow of the program just by reading the assembly code (static analysis).

Software that are used:

  • IDA Freeware
  • Git bash

In this blog I will use a sample file. You can download it from here. The password of the zip is crackmes.de. Let’s get started!

Reverse Engineering

Reverse engineering is a process or method through which one attempts to understand how an application works when we don’t have the source codes. We understand it by disassembling the application and look at each function’s implementation.

Prerequisite

Before we jump to the main agenda, first we must know some things before we can reverse engineer an executable files which is:

  • Basic assembly instructions
  • Registers

There are also other things that need to be understand but the most important things are the assembly language and registers.

When we disassemble an application written in a language like C or C++, we won’t get the original source code of the application but we will get the assembly language instruction. This is because when we create an application, we compile the C/C++ code to assembly language.

Assembly language is important because most of the time we will deal with it even though there is a software that can convert the assembly instruction to a pseudo-code which is more understandable. You don’t have to be a master or can write a program by using assembly language in order to do reverse engineering. You just need to understand some basic instructions and keywords enough to know what the code does. I will provide a link to a source to learn some basic instructions.

Next is registers. Register is one of a small set of data holding places that are part of the computer processor. A register may hold an instruction, a storage address, or any kind of data (such as a bit sequence or individual characters). Some instructions specify registers as part of the instruction. Register is like a variable that stores data. It also used to perform mathematical calculation. You can learn about common registers here. I won’t explain much about register in this blog.

Reverse engineer a sample program

Now we will start the main purpose of this blog. We will reverse engineer this file. Let’s get started.

**PLEASE NOTE: When we do reverse engineer, it is safe to do it inside an isolated environment like virtual machine to prevent from executing malicious file in case if the file is malicious. But in this blog I am not using a malicious file.

First we need to know what file we are dealing with. I will use file command in git bash to find out the type of the file.

file command output

From the command, it says the file is a 32-bit PE executable file. It’s for windows so I won’t use linux to reverse engineer it.

Next, we can try run the file.

running the program

What the program does is it asks user to enter password and then display a message if the entered password is wrong or correct. From this behaviour, we can conclude that our goal is to find out the correct password. So let’s disassemble this file and figure out how the password checking functionality works.

Open IDA, Choose new and drag our file into IDA.

create new file
drag exe file

Then click ‘OK’ and ‘Yes’ and import the debugging file. Then it will look like this.

view in IDA

It does look overwhelming, but rest assured, I will walk you through each process to reverse engineer this file and collect the correct password.

First let’s look at the top box.

top box

You see a lot of instructions here. But it’s easy to understand what is happening here. The tips here is to look at the most significant things which is the function. Look at the call instruction. This instruction call a function. Then look at the functions, there are printf, gets and strcmp. When we see printf the program is actually printing something on the screen. If we remember the program actually display a welcome message. So now we know our position or which part of the program we are looking at now.

printing welcome message

Move on to the next function which is printf again. But this time, it display the password: before the program ask for user input.

display “Password: ”

Next is gets function. As we know gets function takes user input so what this assembly instructions do is taking user input.

gets function called

The instruction above says that they take the address of szPassword variable and store it inside $eax register. Then after gets function finishes, the user input is stored inside whatever address the $eax holds. In this case $eax holds address of szPassword. So the user input is stored inside szPassword variable.

Last function is strcmp. This function compares two strings and check if these two strings equal or not. But what is the strings that is being compared?

strcmp function called

As you see above, before strcmp was called, the instructions push something onto the stack. The first one is szPassword, which stores the user input and then the instructions push str1onto the stack. Then strcmp is called. These two strings, szPassword and str1 are compared by the strcmp function. So basically our input is being compared to str1.

Next, let’s take a look at what happen after our input being compared.

There is two path here, first is red path and the second is green path. If the strcmp return 0 (our input is equal to str1), we will go to the red path and we will go to the green path if our input is not equal to str1.

Now let’s look at str1 value. It says LiL2281337.

str1 variable

Check if LiL2281337 is the correct password.

Success! seems like we found the correct password!

Additional

You may ask, how do I know that szPassword and str1 are the strings that are being compared by strcmp? Let me explain in-depth how function calls in assembly.

strcmp is a function that takes two parameters.

int strcmp (const char* str1, const char* str2);

In 32-bit assembly, before a function with parameters being called, the value of the parameters will be pushed onto the stack first. Then when the function is called, it will take whatever value from the top of the stack as its parameter value. Let’s take a look at this instructions again.

As you can see there are two push instructions, push ecx pushes our input onto the stack and push offset str1 pushes the correct password onto the stack. Now our input and the correct password are placed at the top of the stack.

the stack visualization

Then strcmp is called, it will take two items from the top of the stack to fulfill its two empty parameters. That’s is how function with parameters being called in 32-bit assembly.

Same as the printf, printf takes string to be display as its parameter. So the string must be pushed onto the stack first before printf is called.

printf called in assembly

Conclusion

Reverse engineering is quite hard for beginner but with the right mindset and methodology, anyone can understand the flow of the program even though they have only a little knowledge about assembly language. The most important thing is to find effective methodology and know what to look for. My recommendation is to read as much reverse engineering writeup as possible to expand our knowledge in reverse engineering.

Reference

https://www.tutorialspoint.com/assembly_programming/assembly_basic_syntax.htm

--

--