Delving into Binary Exploitation: A Beginner’s Guide

Published in

MII Cyber Security Consulting Services

6 min readMar 27, 2024

Binary Exploitation

Binary exploitation involves finding clever ways to trick the computer into breaking its own rules. It’s like finding a crack in the wall, slipping through undetected, and gaining access to places you’re not supposed to be. By understanding how the computer processes information and spotting weaknesses in its defenses, you can manipulate it to do your bidding, whether it’s granting access to restricted areas or making it perform tasks it wasn’t designed for.

In real world case or CTF challenge many binary exploitation techniques rely on exploiting memory corruption vulnerabilities, such as buffer overflows, format string vulnerabilities, and integer overflows. By corrupting memory contents, attackers can manipulate program behavior, escalate privileges, or execute arbitrary code. In this article we will talk about beginner’s guide for someone who want to learn binary exploitation, start from what tools we need and GDB tutorial.

Tools

In most cases we only need 3 tools mainly which is, pwntools, GDB, and any decompiler such as IDA or Ghidra. For OS usually we use Linux Ubuntu, or if you comfort with windows, you can use WSL because in order to install GDB you need a linux terminal.

First install GDB by running this command

$ sudo apt-get install gdb

After that you can install gdb plugin like Pwndbg to make gdb easy to read. If everything done, just type gdb on your terminal, and it will run gdb looks like this.

After that you can install pwntools by following the documentation about installation, to check it is already installed or not just run python and import the pwntools like this image below.

Now everything is installed and we ready for exploit.

Memory and Registers: Exploring C Programs x86–64 with GDB

Deep Dive into Memory

Now we will talking about memory and register on 64 bit (x86–64) and how to read it on GDB, we will using C program for example because it will easy to understanding how memory works.

#include <stdio.h>

void hello(){
 printf("Hello World!\n");
}

int main(int argc, char const *argv[])
{
 hello();
 return 0;
}

For example we have a basic C program like that and we compile it with gcc, for every default C program nowadays will have some security that you can check it with pwntools by running command

$ checksec hello

RELRO, Stack, NX and PIE it’s a protection for that program, for now we will focusing on Stack, NX and PIE first.

Stack: Canary Found → This used for buffer overflow protection
NX: NX enabled → This mean, stack address on memory only have rw-p (read and write permission) not rwxp (read, write and execute permission)
PIE: PIE enabled → It means that the executable code of a program can be loaded at any memory address during runtime, rather than being fixed to a specific address.

To more understanding it what it looks like you can check it with gdb by running this command

$ gdb ./hello

Then inside gdb type b*main and then enter, after that type run then enter to running the program. The program will stop at main() function because of b*main command.

To check permission of stack address we can run command vmmap inside gdb.

Now we can see the permission of all address, as you can see the stack address only have permission rw-p (read and write) because of NX Enabled and 0x7ffffffde000 it’s what we called address, and every time we re-run the program it will got different value.

Now let’s talk about Number 1 and Number 2 that I mention on the image above. Number 1 it’s what we called Text address, and number 2 it’s Stack Address.

If we declare a global variable on C program it will stored in Text address and we can easily track that, but if we declare variable inside function it will stored in Stack address and it’s hard to track. For example we have C program like this.

#include <stdio.h>

char string_hello[] = "Hello World!";

void hello(){
 printf("%s\n", string_hello);
}

int main(int argc, char const *argv[])
{
 char string_inside[0x50];
 hello();
 return 0;
}

Compile it and run it with gdb like before.

Run command x/gx <address> and re-enter to check the value inside Text address. As we can see from image above we have address with name string_hello same as we decalre on global variable before.

When we do command x/s &string_hello to print the string inside that variable, it will show string Hello World!, but if we do x/s &string_inside it will show No Symbol that happen beacuse we declare variable inside function and it will goes to stack address.

Registers on x86–64

Now let’s talk about register on 64 bit (x86–64). In computer architecture, registers are small, fast storage locations within the CPU (Central Processing Unit) that hold data that is being processed or manipulated. Each register has a specific purpose and is used for different operations. Here are a few common x86–64 registers:

RSP (Stack Pointer): Points to the top of the stack in memory. It is used to manage the function call stack, which stores local variables, function parameters, return addresses, and other data during program execution.
RIP (Instruction Pointer): Points to the memory address of the next instruction to be executed. It keeps track of the current position within the program’s code.
RDI, RSI, RDX, RCX, R8, R9 (General-Purpose Registers): These registers are used for general data manipulation and passing function arguments. They have specific calling conventions in function calls.
RAX (Accumulator Register): Often used as the primary register for arithmetic and logical operations. It also stores the return value of a function.

As we can see from the image before, when we on main function the RIP will show us address of main then you can run command “ni” inside GDB for next intructions and repeat until you reach call hello

If you run command “si” step instructions you will move inside hello function, but from the image above you can see RDI register is for 1st argument, RSI register is for 2nd argument, RDX register is for 3rd argument and so on.

Because on our hello function doesn’t have any argument, it will not used here. But inside hello function we call a printf function.

As you can see inside GDB it converted to puts instead of printf maybe because we put newline at the end of our string. But we can clearly see the RDI as 1st argument there, also our RIP is changed to hello+18

If you do next intructions you will see a ret command there, that mean the program will return into main function again because our RSP value contain main address.

Now what if we somehow have a vulnerability that can change value about our RSP address value from main to any value we want what will happen?

Here I tried to re-run the program break on hello+24 (b*hello+24), after that I manually change to RSP that contain main address into 0x4141414141414141 which is string AAAAAAAA

Then next intructions and boom!

We got error SIGSEGV, and GDB show us return to 0x4141414141414141 which is not existed on our program. That’s usually what will happen when buffer overflow occur.