Intro to ROP: ROP Emporium — Split

11 min readSep 25, 2017

I recently discovered the excellent ROP Emporium website, A page filled with return-oriented programming challenges of varying difficulty. I must admit that while i felt i understood ROP in principle, I didn’t really understand it in practice until i started solving these challenges. In the following article i’ll share the thought process and method i used in solving one of the easier challenges, Split.

Reconnaissance

Although this is very obviously a return-oriented programming problem, it’s always good practice to check out the binary first.

I use rabin2 -I split, which lists architecture and security features of the binary (among many, many other things). You can also run this command from within radare2 with the I command.

The features that are of immediate interest to us are PIC, NX, and RELRO. We can see that partial relro is enabled (meaning the ELF sections are slightly rearranged to prevent overflows in the writable segments from rewriting ELF headers, and the plt@got is read-only), and position independent code is disabled (meaning relative addresses of shared libraries in the binary will stay the same across executions). Most importantly (if completely expected) the NX bit is enabled, meaning we can’t execute code we put on the stack. Tragically, it isn’t 1999 anymore.

As we shall soon see, these mitigations don’t prevent us from puppeteering the process, however.

Next, I would run strings or something similar on the binary, just to see if there’s any low hanging fruit. Oftentimes when you’re reversing a real binary there will be strings to point you in the right direction. While extremely useful, it’s a crutch, and you should never rely it. Depending on your paranoia level, you might not want to use strings at all.

In this case i’m using radare2’s izz command, which gives the relative address next to each string in the binary.

lucky for us, there’s a very interesting string already present (it’s also labeled, for some reason).

Unfortunately, when we see it in context (with radare’s axt find cross-reference command in this case), we see that it isn’t referenced at all in the program; There isn’t a magic “I win” function we can return to.

Finally, we turn to the actual work: Reversing the binary in your disassembler of choice. Just for variety, i’ll use Binary Ninja.

There’s nothing special in the main function, so we’ll go straight to the function that main() calls, pwnme().

Here we can see the vulnerability we’ll be exploiting: The program allocates 0x20 (32) bytes on the stack for a buffer (0x4007b9, var_28). But later the function reads into the buffer with fgets(), and passes 0x60 (92) as the number of bytes to read. This is clearly a problem: we can write any arbitrary 60 bytes to the stack beyond the buffer itself. Now how do we exploit this?

Developing an Attack

Now that we see a vulnerability, the next step is to exploit it. We’ll quickly review what the stack looks like as a refresher.

*source:* *http://www.installsetupconfig.com/win32programming/processtoolhelpapis12_1.html*

Here is the best image i could find describing the layout of a stack frame in memory. Note that strings grow upwards, while the stack grows downwards. This is extremely useful for us. Notice that the local variables of the function are allocated below the return address of the function. Because stack frames and stack variables grow in opposite directions, we can overflow a local buffer and overwrite the return address on the stack. This is means that when the function is finished, instead of returning the function that called it, we can get to return anywhere we want. This is effectively the same thing as calling arbitrary functions, and allows us to completely bypass a non-executable stack. This is the basis of Return Oriented Programming (ROP).

Instead of executing code on the stack, we push a chain of addresses to the stack. Each of these addresses points to a gadget, which is a simple unit of computation that we can use as building blocks to write nearly any program we want.

For example, if we want to call system() with some arbitrary argument, all we have to do is find a reference to system() in the binary (either a call or in the global offset table) and some gadget that allows us to either push a value to the stack (for x86 binaries) or move a value to the RDI register (for x64 binaries).

We then chain these two gadgets together by pushing their addresses to the stack. First, we call the gadget that sets up the arguments to system(), then that gadget’s ret instruction pops the address of system() and jumps to it. This is effectively exactly the same thing as calling system() directly, if a little more involved.

There are of course some caveats to this idea:

We can only run code that already exists in the program. This includes any libraries loaded by the binary, which leads to a whole ‘nother class of ROP (see: ret2libc)
We can only fit so many ROP gadgets into a given buffer overflow. Fortunately, we have 92 bytes to play with in this challenge, which is plenty. If we needed more space we would have to get more creative.

Building a ROP Chain

Now that we understand what a ROP chain is, how do we build one?

The first step is to find some useful function to call in the binary. Remember: this includes all libraries loaded by the binary!

Fortunately for us, we have the obviously named usefulFunction(), which calls system(‘/bin/ls’). Unfortunately, it doesn’t take any arguments, and it doesn’t do something we want. Before we can call system we have to set up the stack/registers so that the arguments to system actually do something we want.

As a short experiment, can we prove that we can call this function?

let’s quickly calculate an offset to the return address (using the De Bruijn sequence method outlined below) and write the address 0x400807 there.

Then, we’ll use latrace to see if the previously not used system() function is called.

A Brief Note on Calling Conventions

In case you aren’t familiar with x86 or x64, Here is a short summary of the different ways they call functions. Both have multiple conventions, but for brevity i’ll only cover the two that are the most common.

x86 - cdecl: Arguments to a function are pushed to the stack in reverse order (right to left). The function then pops it’s arguments off the stack and does its work. It is irrelevant for this challenge, but it’s worth noting that cdecl is a caller save convention, meaning that the caller has to clean up after the function, ie the function does not clean up its own stack.

x64 — pass by register: In x64 there are many more registers available to use. It is also much faster to load a value to a register than to push a value to the stack. For this reason, arguments to functions in x64 are passed in registers RDI, RSI, RDX, R10, R8, and R9, With the first argument in RDI, the second in RSI, and so on.

Since this article is exploring the x64 split binary, we can see that we need to pass our single argument to system() in the RDI register.

What are the calling conventions for UNIX & Linux system calls on x86-64

I verified these using GNU Assembler (gas) on Linux. x86-32 Linux System Call convention: In x86-32 parameters for…

stackoverflow.com

64-bit Linux Return-Oriented Programming

Executable space protection is one such defence. Unfortunately, it is an ineffective defence. In this guide, we show…

crypto.stanford.edu

Setting up the Arguments

System() is expecting a single char pointer to a command. We need to find an address to a command we actually want to run. Fortunately, because we cheated and used radare2 izz, we already know that the string ‘/bin/cat flag.txt’ exists in the binary. Now that we have an address to the command we want to run, how to we get it into the RDI register?

I’ll be using ropper an excellent python-based utility for finding ROP gadgets in a binary. There are many ways we can get our string’s address into RDI, but by far the most convenient is something called a pop/ret gadget. the pop instruction just puts the element on top of the stack into a register, and return, well, returns to the calling function. If we can find a pop rdi; ret; gadget, we can just put the address we want in RDI on the stack.

We ask ropper to find us this perfect gadget in the binary, and we’re lucky enough that it found one! Interestingly though, this isn’t even an actual instruction!

Here’s what that address disassembles to in Binary Ninja. Notice that there isn’t a pop rdi instruction in there? This is because x86 and x64 are complex instruction set computers (CISC). This means they have very, very large instruction sets, and many of their instructions can have variable length. This means that any given chunk of assembly can be interpreted differently, depending on where you start looking at it. Here, if we start disassembling at 0x400882, we get pop r15, but if we instead shift one byte over to 0x400883, the exact same instruction now decodes to pop rdi! This property vastly expands the gadgets we have available to us, and fortunately ropper is smart enough to look for these “invisible gadgets” for us.

Calling an uncalled function

Now that we have all the pieces we need, lets use our buffer overflow to write the address of usefulFuncion() to the return address of our stack frame:

Using r2’s afl command to find our function’s address

output from “latrace -A -s system python exploit.py”

We can clearly see that we called the uncalled function through our simple ROP.

Putting it All Together

Finally, we’re ready to actually exploit our findings. Our plan of attack is now:

Find an offset into the fgets() buffer that lets us overwrite the return address of the pwnme() function. We’ll cover this in the next section.
Set up a ROP Chain that puts the address of the ‘/bin/cat flag.txt’ string we found during the reconnaissance phase into the RDI register, Which will become the argument to system().
Finally, call system() by returning into it.

Exploitation

I’m going to use the excellent pwntools library for python. Pwntools is amazing.

First, we’ll set up the environment:

All we do here is setup the environment a bit. We store the addresses we already found to ‘/bin/cat flag.txt’ and system() in variables, and turn on logging.

Next, we need to find the offset into the buffer that will overwrite the return address. You can brute force this by trying different lengths until it works, but a more elegant solution is to use a De Bruijn Sequence. A De Bruijn sequence (often called a ‘cyclic’) has the useful property that, for a given length m, no consecutive m characters are duplicated anywhere else in the pattern.

This segment of code just starts the program, sends a cyclic string that we know will overflow the buffer, then loads the crash dump when it inevitably segfaults. Because the ret instruction jumps to a value from the stack, we can look at what the stack pointer (RSP) was pointing at when it crashed, and that will give us a 4 byte subsequence from our cyclic. We can find the exact offset of this 4 byte sequence using the cyclic_find() function. This is the exact offset of the return address, relative to the start of the buffer we control, and we didn’t have to brute force anything.

Then we build the ROP Chain using pwnlib’s most excellent packing functions:

Finally, we start a new process and send our payload.

We run our exploit and check the results:

Here we see the tail end of the first process we ran and crashed, and the pattern from our cyclic that tells us the offset. Then we build a short ROP chain, padded with the appropriate number of bytes from our offset value, and the resulting output. We can see from the last line that we succeeded, and we’ve printed the flag.

Automated ROP with Pwntools

In the previous code we found our gadgets and built a ROP chain by hand. This can easily become tedious with any reasonably large program. Fortunately for us, There is a vibrant ecosystem of utilities for return oriented programming. Here’s how we can build our ROP chain using Pwntool’s built-in ROP functionalities:

This is exactly the same as our string concatenated ROP chain from the first program, but it uses the ROP subsystem of Pwntools. Here, we build a ROP object from our binary, then find a chain that will invoke system(“/bin/cat flag.txt”). Here’s what the ROP chain looks like (printed with the rop.dump() function):

Note that we segfault because we never returned to valid code. After doing our cat command, the program attempts to return to garbage data (gaaa is taken as a return address in this case). There are of course ways around this, but we will not be covering them here.

Conclusion

Return oriented programming is a useful — if initially obtuse — tool at your disposal. Many otherwise unexploitable buffer overflows can be easily exploited with ROP, assuming you have some way read the binary itself. If you don’t have the binary, there are even ways to do ROP “blind”, Which I hope to cover in another article.

I hope you found this article helpful. If you have any questions or comments, i encourage you to get in contact with me!