Understanding Program Memories — from exploitation point of view

Photo by Hope House Press on Unsplash

To start with the organization of binaries in memory and assembly code for those binaries. One need to understand the concept of ELF. So I will start by mentioning the ELF first, then I will move to the Assemblies.


ELF

In computing, the Executable and Linkable Format (ELF, formerly named Extensible Linking Format), is a common standard file format for executable files, object code, shared libraries, and core dumps.

Did that jumped over your head? Don’t worry. Let me simplify it.

ELF is basically a format specifying how the code(binary code either executable or linkable) will look in to the memory. ELF headers contains lot of information about the ELF file content.

One can use following command to view the headers of ELF file generated with compilation of above code.
 readelf -h ./function_call

ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x8048310
Start of program headers: 52 (bytes into file)
Start of section headers: 6860 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 9
Size of section headers: 40 (bytes)
Number of section headers: 36
Section header string table index: 33

Notice two things for now, they will be important for the exploitation as well:

  1. Type: EXEC, meaning this particular binary file an executable one not the linkable one.
  2. Data: 2’s complement, little endian, meaning this file is compiled for the machine that uses little endian notation for the address. What are they ?

How do ELF file looks in to the memory ?

                        +---------------+ Highest Address 0xffffffff
| cmd line args |
| env Variable |
+---------------+
| STACK |
+--+------------+
| | |
| | |
| v ^ |
| | |
| | |
+------------+--+
| HEAP |
+---------------+
| Uninitialized |
| Data(BSS) |
+---------------+
| Initialized |
| Data |
+---------------+
| Read Only |
| data |
| + |
| code |
+---------------+ Lowest Address 0X00000000

The above diagram shows how the 4GB of virtual address space of any 32 Binary looks like when it is loaded into the memory. For understanding, I have drawn the stack in top down order, i.e highest address at the top and lowest at the bottom. Lets’s understand the different sections (from top):

  1. The very first section store the command line arguments and environment variables that are passed to the program during its execution.
  2. Stack: This stores the dynamic variable created inside the function. Don’t confuse yourself with the dynamic variable generated by *alloc family of functions. Every variable is dynamic in program in the sense that every variable is assigned memory during run time only. The normally declared variables inside functions are stored onto the stack. This grows in reverse order ie. from highest address to lowest address.
  3. Heap: Dynamic variables that are created by *alloc family of functions.
  4. BSS: This section stores the uninitialized (global + static) variables. They are automatically initialized to 0.
  5. Next section stores the variable (global + static) which are initialized to some value.
  6. The last section stores all the read-only variables and code of the program in binary language.

Organization of Stack during function call.

Stack is used during function call to save the state of the caller function so that when its return from the called function it can continue to execute normally. And this is how the stack looks like after function call is made.

                        +   Previous function  |
| Stack frame |
| |
+----------------------+ <--- previous stack
|Space for return value|
+----------------------+
|Arguments for function|
+----------------------+
| return address |
+----------------------+
| saved $ebp |
+----------------------+
| | <--- padding
+----------------------+
| local variables |
| |
| |
| |
| |
+----------------------+
| |
| |
| unused space |
+ +

How stack grows:

Now lets try to understand what are the use of $ebp and $esp. And how are function call grows in stack.

  1. esp: As you can see in the diagram the stack pointer or esp will keep on changing after each stack push operation. It is used to keep the pointer of the top of the stack (Top is moving down).
  2. ebp: During runtime, variable are nothing as names. They are stored as the reference to the base of the stack frame. This base is pointed by the ebp register. That’s why when the function calls another function the value of the ebp register is saved onto stack and the ebp register becomes available for storing the new stack frame.

Assembly code for function calls

Assembly language as you can imagine is just above the machine language. High level language such as C are first compiled to assembly language and then they are translated to machine language using assemblers.

Lets take an example how the function call code looks like. You can use the following command to see assembly code of your above compiled C program.
 objdump -d ./function_call

0804840b <foo>:
804840b: 55 push %ebp
804840c: 89 e5 mov %esp,%ebp
804840e: 83 ec 08 sub $0x8,%esp
8048411: 83 ec 0c sub $0xc,%esp
8048414: 68 d0 84 04 08 push $0x80484d0
8048419: e8 c2 fe ff ff call 80482e0 <printf@plt>
804841e: 83 c4 10 add $0x10,%esp
8048421: 90 nop
8048422: c9 leave
8048423: c3 ret
08048424 <main>:
8048424: 8d 4c 24 04 lea 0x4(%esp),%ecx
8048428: 83 e4 f0 and $0xfffffff0,%esp
804842b: ff 71 fc pushl -0x4(%ecx)
804842e: 55 push %ebp
804842f: 89 e5 mov %esp,%ebp
8048431: 51 push %ecx
8048432: 83 ec 04 sub $0x4,%esp
8048435: e8 d1 ff ff ff call 804840b <foo>
804843a: b8 00 00 00 00 mov $0x0,%eax
804843f: 83 c4 04 add $0x4,%esp
8048442: 59 pop %ecx
8048443: 5d pop %ebp
8048444: 8d 61 fc lea -0x4(%ecx),%esp
8048447: c3 ret
8048448: 66 90 xchg %ax,%ax
804844a: 66 90 xchg %ax,%ax
804844c: 66 90 xchg %ax,%ax
804844e: 66 90 xchg %ax,%ax

I have only copied the code of main and foo function here. Observe the call to the foo function from main.

  1. Main seems to push nothing before the function call. That means foo does not takes any arguments.
  2. Call instruction will ask the CPU to save the return address(address next to instruction pointer) into the stack. This is done by the call instruction, so will not be visible in the code.
  3. The first instruction of foo is to push $ebp into the stack.
  4. The immediate instruction will be to pint $ebp to point to $esp.
     Those instruction can be divided into three parts, which are explained in the flow below.

Assembly code for buffers

In the previous Assembly language section code, I mentioned how the code for simple function call looks like. In this section we will show, how the buffers look like in asm code.

Corresponding foo function in assembly.

0804840b <foo>:
804840b: 55 push %ebp
804840c: 89 e5 mov %esp,%ebp
804840e: 83 ec 18 sub $0x18,%esp
8048411: 83 ec 0c sub $0xc,%esp
8048414: 68 d0 84 04 08 push $0x80484d0
8048419: e8 c2 fe ff ff call 80482e0 <printf@plt>
804841e: 83 c4 10 add $0x10,%esp
8048421: 90 nop
8048422: c9 leave
8048423: c3 ret

Notice, two sub calls.

804840e:	83 ec 18             	sub    $0x18,%esp
8048411: 83 ec 0c sub $0xc,%esp

For now we can ignore the first sub call, but second one is important. The second sub call from esp actually updates the stack to allocate the space for the ch buffer. Notice as I mentioned that names are nothing in asm. They are just reference from $esp or $ebp.