How To Start Reverse Engineering — A Guide

8 min readSep 30, 2023

As the name suggests, reverse engineering is finding out how something works, bottoms-up. Reverse engineering in computer science is finding out how a program works. It might come in handy in the real world for various reasons. From wanting to recreate software with enhancements, and simply trying to understand how something works to more professional uses like malware analysis; everything uses reverse engineering.

The Ground Basics

Now that the idea of reverse engineering is covered, let’s briefly cover some concepts.

The first step in understanding reverse engineering involves knowing how programs work. Most software we use daily is written in compiled languages. Apart from compiled languages (C, C++), some languages are interpreted (Python, Javascript), or use slightly different mechanisms (like Java).

Here, we are going to focus only on reversing compiled programs.

Compilers

Compilers are computer software that converts code written in some higher-level language to lower-level, machine instructions. This bundle of instructions (called an executable binary, executable, or simply a binary) can then be executed by the computer.

With reverse engineering, our goal is to attempt and understand how the binary works.

For example,

A very simple snippet of code in C might look like the following

int a = 12, b = 5;
int c = a + b;

This snippet might get translated to x86–64 Assembly in the following way

push 12
push 5

mov rax, [rsp + 0x8]
mov rbx, [rsp]
add rax, rbx

push rax

This when further assembled to machine instructions (AMD64 Linux), might look like the following (hexadecimal representation of a sequence of bytes)

0x6A 0xC 0x6A 0x5 0x48 0x8B 0x44 0x24 0x8 0x48 0x8B 0x1C 0x24 0x48 0x1 0xD8 0x50

If you open any binary in a hex editor, the sequence of bytes represented as hexadecimal would look pretty similar.

Some Useful Tools

Now imagine that you are given an executable. Viewing the hex dump alone wouldn’t take us anywhere.

Code for a hello world program — A sample Hello World program in C

Hexdump of hello world program — Glimpse of a hex dump of the above code

As seen in the above pictures, a simple Hello World program can become hundreds of bytes of instructions. For instance, this simple 7-line program had 997 lines in its hex dump!

Disassemblers

To make things more readable, we use disassemblers. Just as assemblers convert assembly code to machine instructions, disassemblers do the opposite.

GDB is one of the oldest and most powerful tools that can be used as a disassembler. However using GDB can be hard as it’s primarily a terminal-based program and without a thorough knowledge of its commands, using GDB can be a very rudimentary experience. So, there are other tools like Ghidra, Ida, or Binary Ninja which makes working with binaries more interactive and easier.

Disassembly for main() of the hello world program — Disassembly of main() in GDB

Now, this is certainly better than trying to outright read binary. However, reading assembly code directly can still be challenging and confusing, especially for more complicated programs.

Decompilers

Enter decompilers. Again, going by the name you might’ve guessed what they do. They take the assembly generated by disassemblers and output readable (well, most of the time) code in a desired language.

Softwares like Ghidra, Ida etc. come with their decompiler plugins and are very capable. However, one important thing to note is unlike disassemblers which produce a one-to-one accurate translation of machine code to assembly; decompiled code may not be a full translation from assembly.

Decompiling is much more of a harder process, especially as coding concepts get harder (code involving classes and objects for instance). The resultant code must only be treated as pseudo-code.

View of Ghidra interface (with the sample Hello World program loaded)

A Review of a CTF Rev Challenge

CTF (Capture The Flag) competitions are a great way to ethically practice your cybersecurity skills. The competition includes challenges from various categories of cybersecurity. Each challenge involves unearthing a “flag”, which is usually a string in a particular format, for example, flag{...}. The contents inside the curly brackets are of course unique to the challenge.

Reverse engineering is a very popular category and often a part of most CTFs.

Csaw CTF 2023 / Rebug 2

Csaw CTF, which was hosted around mid-September had some beginner-friendly challenges. Rebug 2 is such a challenge from the reverse category.

The challenge description was the following

Screenshot of Rebug 2 challenge description from csaw 2023 — Screenshot of csaw/Rebug 2 challenge description

On downloading the file, the first step I did (and what usually is done) was to run the file command.

This gives us some very basic but useful information about the file. We can mainly infer that the file is a 64-bit ELF executable, along with a few other details.

The next step is to run the strings command.

Running it on this binary doesn’t give us anything useful. But, sometimes you can find crucial leads using the strings command.

There are some other basic Linux commands like strace and ltrace that you can run.

Decompilation

I opened the binary using Ghidra. After running the initial analysis that Ghidra does, I started looking at the decompilation of the main function.

Decompiled main() function of the challenge

The code might not look like regular C, but upon close inspection, it’s quite the same.

There’s a single function call present in the main function and is obviously of interest. Double-clicking on the function takes us to the decompilation of the function itself.

Decompiled printbinchar() from the challenge

Again, there’s a bunch of code which almost looks gibberish. But yet again, there’s a function call to xoring() which is of interest. Navigating to xoring() get the following.

Immediately what caught my eye was the array called flag. Remember that the challenge description also mentions that we need to find the flag in the binary.

It’s pretty clear from the code around the variable that this function is “creating” the flag.

Our goal at this point is to obtain the contents of flag. A point to note is that within the main function, printbinchar() is getting called inside a for loop. Hence, to get the full value of the flag, we’d need to obtain the contents of the array after the control breaks out of the loop.

Dynamic Analysis

Until this point, whatever we had been doing was static analysis. Static analysis is mainly used to figure out the logical flow of a program.

Now that we know how the code works, and need to find out the contents of its memory during run-time, we need to switch to dynamic analysis.

There are various tools for dynamic analysis, the most basic albeit powerful one again being GDB. However, because of the aforementioned reasons, GDB alone can prove to be a difficult tool to work with.

Certain enhancements like pwndbg, peda or GEF are added to GDB to make it more friendly. Another alternative is Radare.

We launch Radare2 using the following command

The two arguments used are -d for debugging mode and -AA for a complete analysis of the binary.

Final Bit

To figure out the contents of the flag variable, we need to follow three simple steps:-

Find the memory address of flag.
Halt the program right after it exited from the loop in main().
View the contents flag from memory using the previously found address.

1. Memory Address

Fortunately, in the case of this binary, we don’t need the memory address of flag. This is because if we look at the corresponding assembly instructions to the array operations

We can see that the memory address of flag was loaded into the RDX register. If we also observe all the code following this, RDX isn’t used anywhere else. This means instead of keeping track of the address of flag we can, at any point, get the address by checking the RDX register.

2. Halt Execution

One of the best features of dynamic analysis is setting breakpoints. Breakpoints allow you to halt the execution of code at almost any point and then give manual control of executing instructions after that.

To set the breakpoint, you need to access the main() function in Radare. You can do this by typing s main to reach the main() function, followed by pdf for disassembling it. The name of the main function in Radare is mostly just ‘main’.

Radare command to disassemble a function — Glimpse of disassembly of the main function

I set a breakpoint right before the program quit as the for loop is the last piece of code in the main function.

Instruction for reference to set breakpoint

db is used to set a breakpoint. It requires the address of the instruction where the breakpoint is to be placed.

Obtain the Flag

Next, we need to run the binary. We can run it by using dc.

The program executes and halts at the set breakpoint. Now, we need to look at the memory where flag is located. We can do this by using the pxw command. The p stands for print, x for hex, w for word. So it will print the memory contents in 16-bit chunks and in hexadecimal. The memory we want to access is stored in RDX so, we can use the dr command to get the contents of RDX.

The final command becomes:-