Understanding Program Memories — from exploitation point of view

Published in

Walmart Global Tech Blog

6 min readJul 30, 2018

To start with the organization of binaries in memory and assembly code for those binaries. One need to understand the concept of ELF. So I will start by mentioning the ELF first, then I will move to the Assemblies.

ELF

In computing, the Executable and Linkable Format (ELF, formerly named Extensible Linking Format), is a common standard file format for executable files, object code, shared libraries, and core dumps.

Did that jumped over your head? Don’t worry. Let me simplify it.

ELF is basically a format specifying how the code(binary code either executable or linkable) will look in to the memory. ELF headers contains lot of information about the ELF file content.

One can use following command to view the headers of ELF file generated with compilation of above code.
readelf -h ./function_call

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x8048310
  Start of program headers:          52 (bytes into file)
  Start of section headers:          6860 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         9
  Size of section headers:           40 (bytes)
  Number of section headers:         36
  Section header string table index: 33

Notice two things for now, they will be important for the exploitation as well:

Type: EXEC, meaning this particular binary file an executable one not the linkable one.
Data: 2’s complement, little endian, meaning this file is compiled for the machine that uses little endian notation for the address. What are they ?

How do ELF file looks in to the memory ?

                        +---------------+ Highest Address 0xffffffff
                        | cmd line args |
                        | env Variable  |
                        +---------------+
                        |     STACK     |
                        +--+------------+
                        |  |            |
                        |  |            |
                        |  v         ^  |
                        |            |  |
                        |            |  |
                        +------------+--+
                        |     HEAP      |
                        +---------------+
                        | Uninitialized |
                        |   Data(BSS)   |
                        +---------------+
                        |  Initialized  |
                        |     Data      |
                        +---------------+
                        |   Read Only   |
                        |     data      |
                        |       +       |
                        |     code      |
                        +---------------+ Lowest Address 0X00000000

The above diagram shows how the 4GB of virtual address space of any 32 Binary looks like when it is loaded into the memory. For understanding, I have drawn the stack in top down order, i.e highest address at the top and lowest at the bottom. Lets’s understand the different sections (from top):

The very first section store the command line arguments and environment variables that are passed to the program during its execution.
Stack: This stores the dynamic variable created inside the function. Don’t confuse yourself with the dynamic variable generated by *alloc family of functions. Every variable is dynamic in program in the sense that every variable is assigned memory during run time only. The normally declared variables inside functions are stored onto the stack. This grows in reverse order ie. from highest address to lowest address.
Heap: Dynamic variables that are created by *alloc family of functions.
BSS: This section stores the uninitialized (global + static) variables. They are automatically initialized to 0.
Next section stores the variable (global + static) which are initialized to some value.
The last section stores all the read-only variables and code of the program in binary language.

Organization of Stack during function call.

Stack is used during function call to save the state of the caller function so that when its return from the called function it can continue to execute normally. And this is how the stack looks like after function call is made.

                        +   Previous function  |
                        |     Stack frame      |
                        |                      |
                        +----------------------+ <--- previous stack
                        |Space for return value|
                        +----------------------+
                        |Arguments for function|
                        +----------------------+
                        |    return address    |
                        +----------------------+
                        |     saved $ebp       |
                        +----------------------+
                        |                      | <---  padding 
                        +----------------------+
                        |    local variables   |
                        |                      |
                        |                      |
                        |                      |
                        |                      |
                        +----------------------+
                        |                      |
                        |                      |
                        |     unused space     |
                        +                      +

How stack grows:

Now lets try to understand what are the use of $ebp and $esp. And how are function call grows in stack.

esp: As you can see in the diagram the stack pointer or esp will keep on changing after each stack push operation. It is used to keep the pointer of the top of the stack (Top is moving down).
ebp: During runtime, variable are nothing as names. They are stored as the reference to the base of the stack frame. This base is pointed by the ebp register. That’s why when the function calls another function the value of the ebp register is saved onto stack and the ebp register becomes available for storing the new stack frame.

Assembly code for function calls

Assembly language as you can imagine is just above the machine language. High level language such as C are first compiled to assembly language and then they are translated to machine language using assemblers.

Lets take an example how the function call code looks like. You can use the following command to see assembly code of your above compiled C program.
objdump -d ./function_call

0804840b <foo>:
 804840b:	55                   	push   %ebp
 804840c:	89 e5                	mov    %esp,%ebp
 804840e:	83 ec 08             	sub    $0x8,%esp
 8048411:	83 ec 0c             	sub    $0xc,%esp
 8048414:	68 d0 84 04 08       	push   $0x80484d0
 8048419:	e8 c2 fe ff ff       	call   80482e0 <printf@plt>
 804841e:	83 c4 10             	add    $0x10,%esp
 8048421:	90                   	nop
 8048422:	c9                   	leave  
 8048423:	c3                   	ret    08048424 <main>:
 8048424:	8d 4c 24 04          	lea    0x4(%esp),%ecx
 8048428:	83 e4 f0             	and    $0xfffffff0,%esp
 804842b:	ff 71 fc             	pushl  -0x4(%ecx)
 804842e:	55                   	push   %ebp
 804842f:	89 e5                	mov    %esp,%ebp
 8048431:	51                   	push   %ecx
 8048432:	83 ec 04             	sub    $0x4,%esp
 8048435:	e8 d1 ff ff ff       	call   804840b <foo>
 804843a:	b8 00 00 00 00       	mov    $0x0,%eax
 804843f:	83 c4 04             	add    $0x4,%esp
 8048442:	59                   	pop    %ecx
 8048443:	5d                   	pop    %ebp
 8048444:	8d 61 fc             	lea    -0x4(%ecx),%esp
 8048447:	c3                   	ret    
 8048448:	66 90                	xchg   %ax,%ax
 804844a:	66 90                	xchg   %ax,%ax
 804844c:	66 90                	xchg   %ax,%ax
 804844e:	66 90                	xchg   %ax,%ax

I have only copied the code of main and foo function here. Observe the call to the foo function from main.

Main seems to push nothing before the function call. That means foo does not takes any arguments.
Call instruction will ask the CPU to save the return address(address next to instruction pointer) into the stack. This is done by the call instruction, so will not be visible in the code.
The first instruction of foo is to push $ebp into the stack.
The immediate instruction will be to pint $ebp to point to $esp.
Those instruction can be divided into three parts, which are explained in the flow below.

Assembly code for buffers

In the previous Assembly language section code, I mentioned how the code for simple function call looks like. In this section we will show, how the buffers look like in asm code.

Corresponding foo function in assembly.

0804840b <foo>:
 804840b:	55                   	push   %ebp
 804840c:	89 e5                	mov    %esp,%ebp
 804840e:	83 ec 18             	sub    $0x18,%esp
 8048411:	83 ec 0c             	sub    $0xc,%esp
 8048414:	68 d0 84 04 08       	push   $0x80484d0
 8048419:	e8 c2 fe ff ff       	call   80482e0 <printf@plt>
 804841e:	83 c4 10             	add    $0x10,%esp
 8048421:	90                   	nop
 8048422:	c9                   	leave  
 8048423:	c3                   	ret

Notice, two sub calls.

804840e:	83 ec 18             	sub    $0x18,%esp
8048411:	83 ec 0c             	sub    $0xc,%esp

For now we can ignore the first sub call, but second one is important. The second sub call from esp actually updates the stack to allocate the space for the ch buffer. Notice as I mentioned that names are nothing in asm. They are just reference from $esp or $ebp.