Binary Exploitation ELI5 — Part 3

“To err is human… to really foul up requires the root password.”


This article is part 3 of the Binary Exploitation ELI5 article series. For Parts 1 and 2 please follow the below links:

Binary Exploitation ELI5 — Part 1
Binary Exploitation ELI5 — Part 2

Throughout this article, we will be covering:

0x01 — Prerequisite Knowledge: Registers
0x02 — Prerequisite Knowledge: The Heap and Memory Allocation
0x03 — Attack: Return Oriented Programming (ROP)
0x04 — Defense: Address Space Layout Randomization
0x05 — Attack: Heap Spraying

Let’s get started!

0x01 — Prerequisite Knowledge: Registers

In earlier articles we discussed concepts such as the stack and assembly calling conventions. In this section, I’d like to expand upon a program’s ability to store data as well as give you a more detailed overview of some important assembly commands.

As mentioned in the previous articles, the stack is a region of memory within every program’s address space that is used to store data (i.e. variables, memory locations, etc.). While the stack can be seen as a program’s backpack, each program has it’s own set of pockets as well, known as registers. Registers are small regions of memory that, depending on the register, store structured or arbitrary information. By structured or arbitrary I mean that some registers store specific types of information, such as the ESP register which is known as the Stack Pointer and exclusively stores the memory address of the top of the stack at any given moment, while others, such as the EBX register can store arbitrary information. When storing data in registers, you have to move the data into the register with the MOV assembly command:

Set the EAX register to 3

In addition to simply storing data, registers can be used to perform general operations as well. Let’s take a look at an example:

Register Operations

For those of you who may still not fully understand registers just yet let’s take a look at a concept I touched on earlier in this section about registers being a program’s pockets.

Imagine you’re getting ready to leave for the day. You’re wearing jeans, a t-shirt with a frocket, and carrying a backpack. Before you go, you need to gather up your stuff. You put your wallet in one pocket, your keys in another, your phone and a pocket knife in another, your sunglasses in the frocket, and everything you can’t fit into your pockets in your backpack.

Programs work very similarly to this, While they can store any data on the stack (their backpack), the registers (their pockets) can only store data that fits within them. However, programs can more efficiently retrieve data from the register than the stack (though in modern computers, the speed difference between the stack and registers is negligible at best).

A list of all registers and their sizes
Registers portrayed as pockets on cargo shorts

0x02 — Prerequisite Knowledge: The Heap and Memory Allocation

So, we’ve now discussed two methods of storing data, the stack and registers. Both of these memory regions are perfect for storing ordinary variables, memory addresses, etc. But what if you need to store something really big? Or what if your program needs a lot more memory than was allocated for the stack? Well, that’s where the Heap comes in. The heap is simply a big chunk of memory that is allocated for the program by the kernel. If the stack is our backpack and the registers are our pockets, the heap can be seen as the room we put our stuff in at night, you may only need a small corner of the room to put your stuff, but just in case you need more you can always allocate more space (more memory). To allocate memory on the heap, we use the malloc function. Malloc is a C function that allocates a desired amount of memory. For example, if you called malloc(8) the program would allocate 8 bytes on the heap. The most common implementation of malloc is the Doug Lea implementation which also stores the block size of each allocation chunk in the bytes preceding the chunk.

Let’s take a look at memory allocation with a simple example:

Bob works at a factory. Bob was asked to set aside 3 screws for Part A, and 4 screws for Part B. Bob writes “3-A” and “4-B” on a piece of paper then puts 7 screws down on the paper. Now, Bob has allocated the correct amount of screws for both pieces in the same area.

See? Pretty simple! If I want to allocate 8 bytes of memory on the heap, I use malloc(8) which places 0x08 onto the heap in the space directly preceding the memory location of the allocated area. This way, if we later need to allocate more memory, the program simply takes the base address of the heap and adds 0x08 to it to accommodate for the previously allocated space! Easy-Peasy!

0x03 — Attack: Return Oriented Programming (ROP)

Earlier in the article series we talked about the Return-to-libc (Ret2libc) attack in which an attacker overwrites a return variable and forces the program to execute a system command through the libc library. In this section, we’ll be talking about Return Oriented Programming which can be seen as a more advanced version of ret2libc (though it usually doesn’t involve libc at all). In ROP attacks, the attacker manipulates return calls to return to different segments within the assembly code (called gadgets) that allow the attacker to essentially take full control over the program’s execution. For example, an attacker could continuously navigate to different gadgets in order to create a backdoor into the program without actually manipulating any code or memory areas (no direct execution on the stack or heap)!

Before we talk further about how this attack is done, we need to dive a bit deeper into what gadgets actually are. Gadgets are segments of assembly code found within the application that end with a return statement.

Different ROP Gadgets

Since gadgets are non-malicious, compiler built, pieces of code that are part of the program you’re trying to exploit, all an attacker has to do is locate them (which can be done by either simply disassembling the program or through gadget-finding programs like Ropper) and then place their addresses onto the stack. By building a chain of gadgets, and carefully manipulating the data on the stack, an attacker could write a perfect backdoor into a program or do literally anything they want.

Let’s take a look at an example:

Very basic ROP attack

As you can see, if an attacker is able to manipulate data on the stack, he/she can simply push variables and overwrite return addresses to use the program anyway he/she wants! In the above example all we did was make the program add 3 and 4 so that the eax register would equal 7, but what if instead of that simple addition, you used a whole chain of gadgets to push different pieces of data onto the stack and eventually land on gadgets with network calls that could extract data from the computer and send it to an attacker-held server.

0x04 — Defense: Address Space Layout Randomization (ASLR)

Earlier in this article series we spoke about DEP/NX as a viable defense against code execution in memory areas such as the stack and the heap. However, with attacks like ret2libc and Return Oriented Programming, we were quickly able to defeat this defense mechanism. So now, if we can’t defend against code being executed because of ROP attacks, and we can’t ensure the defensive capabilities of stack canaries because of format string exploits, how can we defend against binary exploitation attacks? Well, what if we simply randomized where memory segments are each time we open the program? That’s exactly what Address Space Layout Randomization is. Every time the program is executed, the stack, heap, .text section (where the assembly code is stored), and other memory sections are placed into memory in random offsets so that, while the program itself can run flawlessly, the attacker can’t just get the memory address of an important piece of data on the stack or find the same gadgets in the same place every time they try to exploit the executable. In addition, with a further implementation of Kernel Address Randomized Linking, attackers wouldn’t even be able to confidently locate linked libraries like libc, as each time the link is initialized the library would be loaded into memory at a random offset.

This sounds like a perfect solution, right? Well, yes and no. While yes, ASLR makes developing exploits for vulnerabilities FAR more difficult, malicious agents are intelligent and tenacious. In further sections we’ll discuss attacks that can beat ASLR.

0x05 — Attack: Heap Spraying

So, before we get into what a Heap Spray is, I need to preface by saying I’m actually skipping an attack in this article series — Heap Overflows. This is because they’re EXTREMELY similar to stack buffer overflows, in a Heap Overflow, the attacker gains the ability to write arbitrary data to the heap and, so long as DEP/NX is not enabled, the attacker can then write malicious code into the heap. However, as mentioned in the above ASLR section, even if the attacker could write malicious code to the heap, he/she wouldn’t be able to execute it as the memory location of the code would vary depending on where the program placed the malicious code within the heap (which would be randomized through the implementation of ASLR). So, how would we go about finding and executing this code?

Well, what if instead of just writing the malicious code once to a specific region in the heap, we just copy it all over the heap and then try to execute anywhere within the massive heap section? Well, that’s exactly what heap spraying is. It’s pretty much the law of averages — Throw enough sh*t at the wall and eventually somethings gotta stick!

Let’s break it down into a more basic example:

Bob is a blind dart player. Obviously, if Bob simply threw one dart, he’d have almost no chance of hitting the bullseye (since he can’t even see the board!) However, if Bob throws 1000 darts, at least one of them is bound to hit the center of the target.

If we place our malicious code in the heap once, it’s very unlikely that we’ll find it when pointing to single memory addresses, however, if we completely flood the heap our malicious code, we’re bound to hit it eventually!

Please note, on 64 bit systems heap spraying is far less effective than on 32 bit systems as, on 32 bit systems, you could completely fill the heap with your malicious code while, on 64 bit systems, the heap is so massive that even if you tried to spray the heap you may still have trouble finding your malicious code.

0x06 — Conclusion

In this article we covered:

0x01. Registers
0x02. The Heap
0x03. Malloc
0x04. Address Space Layout Randomization
0x05. Kernel Address Randomized Linking
0x06. Return Oriented Programming
0x07. Heap Spraying

I hope this article was helpful. Now that we’ve covered most of the important prerequisite knowledge and some good basic attacks/defenses the next parts of this article series are going to get a whole lot juicier with more advanced and exciting attack/defense sections!

push “Thanks”
push “For”
pop eax
pop ebx
mov ecx, “reading”