Reverse-engineering: Using Linux GDB

At Holberton School, we have had a couple rounds of a ‘#forfun’ project called crackme. For these projects, we are given an executable that accepts a password. Our assignment is to crack the program through reverse engineering. For round one we were given four Linux tools to use, and we had to demonstrate how to find the answer with each tool. It was quickly apparent that using a standard library string comparison is a bad idea as was hardcoding passwords into the executable in plain text. Another round demonstrated that the ltrace tool could gather not only the password from string comparisons, but the encryption method (MD5 in that case) to decrypt the password.

This week we were given another crack at hacking. I went to my go-to tool for reverse-engineering, the GNU Project Debugger (aka GDB), to find the password. If you would like to take a shot at cracking the executable, you can find it at Holberton School’s Github. The file relevant to this post is crackme3.

Program Checks

Before I dig too deep into the exec file, I check what information I can get from it. First, I do a test run of the file to see what error information is provided.

Usage: ./crackme3 password

For this executable the password is expected to be provided on the command line. The next check I run is ltrace just to see if the password will appear. In addition, it can provide some other useful information about how the program works.

$ ltrace ./crackme3 password
__libc_start_main(0x40068c, 2, 0x7ffcdd754bd8, 0x400710 <unfinished …>
strlen(“password”) = 8
) = 3
+++ exited (status 1) +++

The return shows that the program is checking the length of the string, but there is no clear indication that this is a roadblock. Time to break down the program.

The GNU Project Debugger

GDB is a tool developed for Linux systems with the goal of helping developers identify sources of bugs in their programs. In their own words, from the website:

GDB, the GNU Project debugger, allows you to see what is going on `inside’ another program while it executes — or what another program was doing at the moment it crashed.

When reverse engineering a program, the tool is used to review the compiled Assembly code in either the AT&T or Intel flavors to see step-by-step what is happening. Breakpoints are added to stop the program midstream and review data in the memory registers to identify how it is being manipulated. I will cover these steps in more detail below.

The Anatomy of Assembly

To get started, I entered the command to launch the crackme3 file with GDB followed by the disass command and the function name. The output is a list of Assembly instructions that direct each action of the executable.

$ gdb ./crackme3
(gdb) disass main

The AT&T and Intel syntaxes are displayed above side-by-side. However, the output will actually display only one of the two. I prefer to use the AT&T format because the flow makes more sense to me. The first column provides the address of the command. The next column is the command itself followed by the data source and the destination. Jumps and function calls have the jump location or function name following those lines. Intel syntax reverses the data source and destination in its display. There are additional differences in the command names and data syntaxes, but this is common when comparing scripts of two different languages that perform the same function. If I were writing Assembly, my syntax preference might be different and would be based on more than just flow of information.

The Logic Flow

Every script depends highly on logic flow. Depending on the compiler and options selected when compiled, the flow of the Assembly code could be straightforward or very complex. Some options intentionally obfuscate the flow to disrupt attempts to reverse engineer the executable. Below is the output of the disass main command in AT&T syntax.

The portions of the command not highlighted are jumps and closing processes before exit. There are four types of jumps in the output of main and check_password; je, jmp, jne, and jbe. The jmp command performs the described jump regardless of condition. The other three are conditional jumps. The first two, je and jne, are straight forward. They mean jump if equal and jump if not equal. The last command, jbe, is a jump used in a loop that means jump if less than or equal.

The Heart of the Question

Ultimately we are looking for the password. Based on the information from the main output, it is primarily depending on the check_password function to determine whether to exit or provide access. To analyze the process happening in that function, I entered disass check_password.

The first thing I confirm is that the length of the password entered is important. The program looks for a password that is four characters long. The instruction at 0x400632 actually shows the password in integer form, but I did not recognize it immediately. That value is stored in memory four bytes before the memory address stored in the RBP register. I use x/h * $RBP - 0x04 to print the value. The ‘h’ stands for hexadecimal and it is the easiest format to to see how the password is stored. From the instruction set, a comparison of two registers, rax and rdx, occurs at 0x40066a. This is where the next step of my investigation leads.

(gdb) b *0x40066a
(gdb) run test
Starting program: /home/vagrant/reverse_engineering/crackme3/crackme3 test
Breakpoint 1, 0x000000000040066a in check_password ()
(gdb) info registers

Registered and Certified

I set a breakpoint to analyze the data in process. Breakpoints do exactly what they say, they interrupt the process at the given instruction address. Once the breakpoint is set, I initialize the executable with the command run test. The value ‘test’ is the four character password I used to get past length test and into the password comparison. Once the breakpoint is triggered I enter info registers to view the data in the registers at the point the program was interrupted.

Blue — User input | Green — Stored Password
Register Data On Each Loop
1: RDX — 0x41 = A
2: RDX — 0x42 = B
3: RDX — 0x43 = C
4: RDX — 0x4 = ^D or EOF

From the register information, I find two integers are stored; 0x74 and 0x41. The ASCII value of the letter ‘t’ is 0x74. The printable letter for 0x41 is ‘A’. I also noticed that in RCX is an integer value of 0x4434241. If read in reverse, it is 41, 42, 43 and 4. Converted to the character values it is A, B, C, ^D or EOF. Inputting the password is tricky. Bash interprets the EOF file command so it isn’t passed to the executable. In fact, it is used to exit executables. I tried to store it in a file, but emacs reads ^D (Ctrl + D) as an end of buffer command. My workaround is to use an online ASCII to text converter and paste into a file through Atom. It adds a new line character which I remove with emacs. To get the password past Bash and into the executable, I use the command line below. This keeps Bash busy with passing the values to the executable so it does not interpret the EOF, end of file or transmission.

./crackme3 $(< 0-password)

Gdb is a powerful tool that is useful for much more than I have covered in this post. Take the time to read the documentation from GNU to learn more. I am confident there are many other tools that can be used as well. Share your go-to tool for reverse-engineering or debugging in the comments below.