Understanding Buffer Overflow Attacks (Part 1)

Rajinish Aneel Bhatia
CodeX
Published in
6 min readJun 30, 2024

Although buffer overflow attacks are pretty rare these days and even running them intentionally requires a bit of work due to added protection, it’s still pretty fun to explore how they work because you get a deeper understanding of machine code and stack memory. And the motivation for this comes from the article written on phrack on the same topic, so I recommend checking that out too. So, I’m going to assume some basic knowledge of assembly and we’re going to dig right into how stack memory works for functions. In case things are still a bit unclear, I recommend checking these notes out. Let’s say some function f is called what exactly happens on the machine level is that the instruction pointer gets redirected from where f() was called to actual code for f. And so, let’s look at a few important registers that we’re going to use.

RIP — this stores the address of the instruction pointer

RSP — this stores address of the top most element of the stack

RBP — this is a frame pointer used for accessing local variables

When f(a, b, c) is called the first thing that happens is that the arguments are pushed onto the stack in the reverse order c, b, a. Actually, this is handled differently for different machines, some use registers to pass some number of arguments and then push the rest onto the stack. The next thing that happens is the address of the following instruction gets pushed onto the stack because after we are done with f, we want to return to the next instruction. And then the instruction address of f is loaded into RIP so that f may start evaluating. If you look at assembly code for functions, nearly for all of them the first thing that happens is that RBP is pushed onto the stack — this is the frame pointer that is used for accessing local variables, in other words the location of local variables is calculated using this address. So, when we push RBP onto the stack what we are doing is preserving the last function’s RBP. The next thing done is: `mov %rsp, %rbp` , so we move the current stack pointer to the RBP register, this will act as the reference point for calculating the addresses of the local variables. Then some memory is allocated for the local variables by subtracting some bytes from RSP. The stack grows in the order of decreasing memory. When the function returns EBP is restored, then RIP from the stack is popped off into RIP so the machine can go back to executing the calls that followed f(a, b, c). What buffer overflow attempts to do is override this RIP stored on the stack so that we can make the machine execute arbitrary code of our liking.

This is a vague picture of the stack to have in mind for a function like f(a, b, c)

[local var memory] [saved-rbp] [saved-rip] [a] [b] [c] — remember the stack grows down-word, or in the way I have written from right to left. Our goal is to attack [saved-rip]. It is essential to know the size of these registers on your machine before you starting digging into it. In general, for 64-bit processors 8 byte registers are used (rbp, rip, rsp…), for 32 bit you might see (ebp, eip, esp).

Let’s look at some example code for starting out:

when function is called in main, the instruction pointer of x = 1 is stored on the stack as [saved-rip] above, what we want to do is change that using long*ret in function so that x = 1 is never executed. Let’s modify this a bit to see where things are stored on the stack using assembly. Something like this will do:

To compile we must use -fno-stack-protector flag to gcc (modern programs are mostly safe from these basic buffer overflow attacks but for the sake of learning we can bypass these safety measures) and also use -g flag to compile, that just tells gcc to put the actual code in the compiled file so that we can see it in gdb. I saved the compiled file as `e1`, to open it using gdb type `gdb ./e1`. And then in gdb type `layout split`.

You should see something like this:

Let’s look at function’s assembly code (you can navigate by pressing up and down arrow keys)

First the 3 arguments are transferred from edi, esi, edx registers to somewhere on the stack. Then we move 0x61 to -0xd bytes from rbp or in decimal we move 97 to -13 bytes from rbp, 97 is just the ascii code for the character ‘a’ so -13 bytes from rbp must be where buffer1 is stored, next we move 98 (0x62, or character ‘b’) to -0x17 (-23 bytes) from rbp this must be where buffer2 is stored. Next, we put whatever is -8 bytes from rbp into rax, and move 1 into the address pointed to by rax. So, -8 bytes from rbp must be where our ret (long* pointer) is stored, the size of pointers on 64 bit processors is 8 bytes.

Here’s how the layout looks like:

[buffer2–10 bytes] [buffer1–5 bytes] [ret — 8 bytes] [saved rbp — 8 bytes] [saved rip]

So, from buffer1 we should go 5 + 8 + 8 bytes to change rip, but what should we change rip to?

Look at this again, the main+39 instruction immediately after call is [saved-rip] on the stack, and it’s the one moving 1 into x since the instruction that follows this instruction is at main+46 that means that this instruction in total takes 7 bytes. So, we just increment [saved-rip] by 7 bytes we should skip this entirely. Let’s try it out:

Compile it the same way (with the same flags), and run the program. And, we get 0:

This is where I’ll stop for this part, in the next one we’ll go through how to write assembly code for launching a shell and then finally how to use that to attack a program that naively copies string without checking bounds from input to some string buffer. You can find the full code here: https://github.com/Rajinish0/simple-bufferOverflow-attack

--

--