Understanding Memory Layout

Shohei Yokoyama
5 min readNov 10, 2018

--

The memory refers to the computer hardware integrated circuits that store information for immediate use in a computer. The computer memory is built to store bit patterns. Not only data but also instructions are bit patterns and these can be stored in memory. In systems software, they are stored in separate segment of memory. And the segments are also divided by data and program type.

The multitasking OS runs in virtual address space. In case of a 64-bit system, memory addresses are allocated by 8 bytes, 4 bytes for 32-bit systems, and 2 bytes for 16-bit system. This value is called address size, the smallest unit addressable by the CPU is 1 byte ( 8 bit ).

When the program runs, the processing is performed in two spaces called Kernel Space and User Space on the system. The two processing spaces implicitly interfere with each other and the processing of the program proceeds.

  • Kernel Space

The kernel space can be accessed by user processes only through the use of system calls that are requests in a Unix-like operating system such as input/output (I/O) or process creation.

  • User Space

The user space is a computational resource allocated to a user, and it is a resource that the executing program can directly access. This space can be categorized into some segments.

Memory Layout in program

The following picture shows virtual memory spaces of kernel space and user space. The user space part of the virtual space is categorized into Stack and Heap, BSS, Data, Text.

Stack

The stack space is located just under the OS kernel space, generally opposite the heap area and grows downwards to lower addresses. ( it may grow the opposite direction on some other architectures )

The stack is LIFO ( last-in-first-out ) data structure. In computer science, a stack is an abstract data type that serves as a collection of elements, with two principal operations:

  • push, which adds an element to the collection, and
  • pop, which removes the most recently added element that was not yet removed.

This area is devoted to storing all the data needed by a function call in a program. Calling a function is the same as pushing the called function execution onto the top of the stack, and once that function completes, the results are returned popping the function off the stack. The dataset pushed for function call is named a stack frame, and it contains the following data.

  • the arguments (parameter values) passed to the routine
  • the return address back to the routine’s caller
  • space for the local variables of the routine

The following is an example of C program and picture of stack memory allocation.

int main() {
int result = getResult();
}
int getResult() {
int num1 = getNum1();
int num2 = getNum2();
return num1 + num2;
}
int getNum1() {
return 10;
}
int getNum2() {
return 20;
}

When the function is called, the stack frame is pushed to the top of stack. Then the process is executed and the function goes out of scope, the stack frame pops from the top.

As described above, it can only store limited scope data. However, In memory management, it runs very fast because the stack pointer register simply tracks the top of the stack.

Heap

The Heap is the segment where dynamic memory allocation usually takes place. This area commonly begins at the end of the BSS segment and grows upwards to higher memory addresses. In C, it’s managed by malloc / new, free / delete, which use the brk and sbrksystem calls to adjust it’s size.

The allocation to the heap area occurs, in the following cases.

  • memory size is dynamically allocated at run-time
  • scope is not limited. (i.g., variables referenced from several places)
  • memory size is very large.

It’s our responsibility to free memory on the heap. The objects on the heap lead to memory leaks if they are not freed. In garbage-collected languages, the garbage collector frees memory on the heap and prevents memory leaks.

The unused area may be generated on the heap by repetition of allocation and release of the area. A state in which “unused nodes” and “in use” nodes are mixed, that is, The state in which unused areas are divided into pieces by garbage, is called a fragmentation state. In this state, the overhead of searching for free space and degradation for “locality of reference” of the data, so the performance is relatively low.

BSS ( Block Started by Symbol )

Uninitialized data segment, often called the BSS segment. Data in this segment is initialized by the kernel to arithmetic 0 before the program starts executing. For instance, a variable declared as static int i; would be allocated to the BSS segment.

Data

The data segment contains initialized global and static variables which have a pre-defined value and can be modified. it’s divided into a read-only and a read-write space.

For example, the following C program outside the main

int val = 3;
char string[] = "Hello World";

Text

A segment in which a machine language instruction is stored. This segment is a read-only space.

Stack vs Heap

The stack is faster because all free memory is always contiguous. Unlike heap, No list need to keep a list of all the free memory, only one pointer to the current top of the stack. Each byte in the stack tends to be reused very frequently which means it tends to be mapped to the processor’s cache, making it very fast. Therefore, I recommend using stack as long as you don’t need to use heap.

References

--

--

Shohei Yokoyama

【横山 祥平 / @shoheiyokoyama 】iOS Engineer at SmartNews, Inc. EX-CyberAgent, Inc, Github: https://github.com/shoheiyokoyama