Memory Allocator Design : malloc()

6 min readDec 30, 2021

malloc( )

malloc() and free() are part of the APIs offered by system memory allocator or memory manager. User programs use these APIs for dynamic memory allocation. malloc() internally calls upon operating systems to allocate memory resource — raw, uninitialized memory, as needed, from a memory segment called “Heap”.

But, allocating memory in this region of memory could get expensive due to external fragmentation — wasted space between allocated blocks. The performance of STL (standard template library in C++) vectors could be affected by this fragmentation issue as the memory is allocated using “new” operator. Further, malloc() can also result in “internal fragmentation” (wasted bytes are within the allocated block, example : user requested 20 bytes, and we returned 32 bytes <aligned>, 12 bytes are unused), if aligned malloc() operation is used.

In this blog, let’s take a look at implementation of a custom memory allocator and corresponding malloc() API design.

API required

void* malloc(size_t n_bytes);

Inputs : API shall take the size of memory to be allocated and return a pointer to the memory allocated. malloc( )allocates contiguous block of memory per request. However, its non-contiguous across different allocation requests.
Returns : It then return NULL if memory can’t be allocated, else valid pointer to start address of memory chunk. The return type is marked void*, that can be then cast to the type required by the user.

Data Structure

To design a custom memory allocator, we would require a way to keep a tab on allocated blocks of memory (to free later) and also maintain list of free or available blocks — so we can allocate, when new requests come in. There are multiple data structure options to perform this bookkeeping. Operations on this data structure must be quick — as both malloc() and free() shall operate on this.

In this blog, we will explore one of the ways that is explored in KnR. We will use a circular linked list data structure for bookkeeping only the free or available blocks of the memory — essentially to maintain a “Free list”.

Block A, B, C, D, F are owned by the memory allocator and block E is the block not owned by memory allocator. Blocks with free memory are linked together.

As the name suggests, the memory allocator’s “free-list” keeps track of blocks of free memory. Once the block is used or allocated, it will be removed from this list.

typedef struct Header{
Header* next;
size_t size;
} Header;static Header base;/* empty list to get started */ 
(not allocated on heap)
static Header* pFree= NULL;/* start of free list */

We declared a static member of type Header because we always need a header (base). Initially base will not point to anything, it wont have any memory to return and size will be zero.
Once malloc() receives requests for memory allocation, base pointer will point to allocated block.

Design considerations

What should be the strategy for finding available chunk of memory, for a given request using free list :

First Fit : Go through circular list (free list) from start to end and check first available big enough chunk that satisfies request and return that. Pro , Con : causes too many small blocks at the beginning of the free list.
Best Fit : Go through entire list (O(n) for each malloc) and find the one that is the best fit for the request (smallest one) and return that. Pro : optimal allocation, reduces fragmentation, Con : leaves smaller blocks, slower — for every malloc() traverse entire free list.
Next Fit : Go through list like first-fit but remember where we finished searching and resume searching from there — New block taken from tail of next sufficiently large block. Pro: does not concentrate small blocks at beginning- faster , Con: slower than first fit as in theory it takes longer in steady state to find a match.

Concerns

Fragmentation • Allocation and free latency • Implementation complexity
Allocators try to optimize for multiple variables: — Fragmentation, speed, simplicity, etc.

Time complexity :

To perform allocation, we will need to examine the list maintained, worst-case O(n) per malloc( ) request.

Space complexity:

As system memory gets tiny, the bookkeeping overheads become a large percent of total system memory
For very large (thousands of CPU) systems, complex allocator bookkeeping gets out of hand

Alternate approaches

Instead of circular list based allocator described above, we can also use two alternate approaches :

Slab allocator (GNU libc) :

Divide blocks into small and large. Only block > threshold size managed using free list. Else, allocate power of two and use “bitmap” for each range of blocks of same size, for bookkeeping.
Fast for small blocks, slower for large blocks. Minimal space overhead. No external fragmentation (as it’s a predefined block for small block allocations, but has wasted space)

2. Buddy system (Linux kernel)

Similar to slab allocator but only allocate blocks in sizes that are power of 2. Keep separate free lists for 16 byte, 32 byte and 64 byte blocks etc.
If no free block of size n available, find block of size 2n and split it into two blocks of size n.

Implementation with circular free list

#define MEM_SIZE_REQ 1024 // from system call, request a fixed size instead of smaller number of bytes.void* malloc(size_t nBytes){Header* getMemory(size_t bytes); // to request more mem from systemsize_t bytes_as_hdr_blocks = (nBytes +sizeof(Header) - 1)/ sizeof(Header) + 1;Header* prev, curr;cases to handle:(i) No free list - create one (base is a global var of type Header (not allocated on Heap), freep is an initial pointer, points to &base.if((prev= pFree) == NULL){     base.next = prev = pFree = &base; base.size = 0;
}(ii) if free list is createdprev = pFree;// Loop through free List and find block available for allocationfor(curr = prev->next;;prev = curr, curr=curr->next){(a.) found a block with >= nUnits size
if (curr->size >= bytes_as_hdr_blocks){      1) exactly nUnits : update pPrev Pointer next field and return    pCurr block addr to user     if(curr->size == bytes_as_hdr_blocks{
            prev->next = curr->next;
      } else{     2) bigger than nUnits: allocate from tail end
curr->size = curr->size - bytes_as_hdr_blocks; 
curr = curr + curr->size; 
curr->size = bytes_as_hdr_blocks;      } // else block end     pFree = prev;
     return (void*) (curr+1); // start of user block , skip header.
} // >= block(b.) wrapped around back to free list start - // make system call; continue
if(curr == pFree){
// looped through fully : 1) make system call and allocate , if this returns NULL - DONE. MEM_SIZE_REQ is used to get system memory inside getMemory() implementation.(curr = getMemory(bytes_as_hdr_blocks)) == NULL){
return NULL;
}2) Else : go through infinite loop for allocation
}

Conclusion

There is no single best approach for custom malloc design— it really depends on the application.
Tip : Study the allocation/ deallocation demands of given application to make the right selection.

Don’t forget to drop some claps, if you enjoyed learning about designing malloc() operation (new[] in CPP) using custom memory allocator. In the next blog, we will take a look at corresponding free() method.

References

https://kremlin.cc/k&r.pdf

Memory Allocator for embedded system (K & R Ritchie book)

Memory allocator given in K & R book This memory allocator use single linked list, the following picture represent the…

gnuchops.wordpress.com

https://docs.google.com/viewerng/viewer?url=https://www-inst.eecs.berkeley.edu//~cs61c/fa07/lecture/L05.pdf

https://www.cs.unc.edu/~porter/courses/comp530/f16/slides/malloc.pdf