Designing and implementing a pool allocator data structure for memory management in games
In game programming, we are constantly creating objects at runtime, such as meshes, sprites, vectors, textures and so on. Since we usually can not know how many objects we are going to need at runtime, the necessary memory for those objects have to be allocated dynamically.
Dynamic memory allocation, also known as heap allocation, it’s typically done in C++ via malloc() and free(), or with C++’s global new and delete operators. The problem with using the operating system’s heap allocator to allocate and deallocate memory for the objects in our games, is that of performance. The operating system’s heap allocator is very slow, mainly for two reasons. First, a heap allocator is a general purpose facility, so it needs to accept allocations of any size. This requires a lot of management overhead, making malloc() and free() inherently slow. Second, most operating systems do a context-switch from user mode to kernel mode to execute a call to malloc() or free(). These context switches can be extraordinarily expensive. So, one thing is certain, doing a lot of heap allocations at runtime, especially within a loop, will definitely degrade the performance of the game.
This situation leave us with a problem, we need to dynamically allocate memory for all those objects and assets that are created at runtime, but the default heap allocators are extremely slow. What most game engines do, is to implement one or more custom allocators. A custom allocator can have better performance characteristics than the operating system’s heap allocator for two reasons. First, a custom allocator can preallocate a memory block to answer the requests, allowing it to run in user mode and completely avoid the cost to context-switch to kernel mode. The memory block itself can be allocated using malloc() or new, or declared as a global variable. Second, by designing the allocator to meet specific usage patterns, it can avoid the management overhead of a general purpose allocator, making it much more efficient.
One very common memory usage pattern in game programming, is the need to allocate lots of small blocks of memory of the same size. This happens, for example, when we need to create multiple objects of the same type, such as matrices, meshes, AABBs, particles and so on. For this type of allocation pattern, a pool allocator is often the best choice.
The way pool allocators work, is by preallocating a large memory block whose size is an exact multiple of the size of each element that will be allocated. For example, suppose we want a pool allocator for creating 3D vectors, its size would be an exact multiple of 12 bytes — that’s 3 elements per vector times four (for 32-bit floats) or eight bytes (for 64-bit floats) per element. Each element of the pool is added to a linked list of free elements; on initialization, this free list contains all the elements of the pool. When a allocation request is made, we simply take the first element of the free list, and return it. Likewise, when a element is freed, we simply store it back on the beginning of the list. Because both allocations and frees require only a few pointer manipulations, those operations have a complexity of O(1).
The linked list of free elements, can be a singly-linked list, that is, we only need to store a single pointer for each free element of the list. One way of doing this, would be to preallocate another memory block for those pointers, a block of size (sizeof(void*) * num_pool_elements) bytes. But a better solution is to realize that those elements on the free list are, by definition, free memory elements, so instead of preallocating another memory block, we can store the ‘next’ pointers of the free list on the pool’s elements that are not being used. This approach, which does not waste any memory, works as long as the pool_element_size >= sizeof(void*).
As example, let’s see how a pool allocator of 5 elements, each with a size of 8 bytes, would manage its memory block and its free list.
As we can see from figure 1, on initialization we have all the elements of the pool inside the free list, see how we store each ‘next’ pointer inside the free memory blocks(elements). Also notice that we need to keep a pointer to the first element in the free list.
In figure 2 we can see how the allocator satisfies a memory request. See how we simply return the first element in the free list, and update the head of the list to point to the new first element — the element that was pointed by the previous first element of the list. Figure 3 shows the pool after two more allocations.
In figure 4 we can see the pool after freeing the second memory block. As we can see, we move the freed memory block to the start of the list, for this, we need to do two updates. First, we set the head of the list to point to newly freed element. Second, we set the newly freed element’s next pointer, to point to the previous first element of the list.
The pool allocator data structure was implemented in C++ on the files Pool_allocator.hpp and Poll_allocator.cpp. In the next sections we’ll go into some implementation details on the initialization, allocation and freeing operations.
In listing 01 we have the class declaration. We keep a double pointer to void for storing the address of the first element in the list (m_ppfree_memory_list) . We also store a pointer to the first memory address of the poll’s memory block(m_pmemory), this is used to deallocate the entire pool. The initialization is done in the function alloc_pool, we can request a pool’s element by calling get_element, and for freeing a pool’s element we use the free_element function.
The initialization function have as parameters the size in bytes of each element, the requested number of elements in the pool, and the alignment requirement. The first “part” of the function, basically checks the parameters for some error conditions, for example, it guarantees that the size of each element is at least as big as the size of (void*), verifies that the element’s size is a multiple of the alignment requirement and so on. The key part of the function is the creation of the pool’s free list, shown in listing 2.
To create the free list, we first initialize the head of the list, setting its value to the starting address of the pool’s memory block. Next, using the element’s size in bytes and the number of elements in the pool, we calculate the address of the last element. Finally, knowing the number of elements and the last element’s address, we iterate through each element in the pool, writing the address of the next element in its memory block, with the exception of the last element, whose next pointer value is set to null.
The get_element function, shown in listing 3, is used to request a pool’s element. This function contains some code to check for an uninitialized pool, and for an out of memory pool, in both cases it returns a null pointer. The actual code to get a pool’s element is only two lines. First, it stores in a local variable, the address of the first element in the list(the return value), and second, sets the head of the list to point to the next element in the list, i.e, the element pointed by the previous first element.
The free_element function, shown in listing 4, implements the deallocation operation. As we can see, the process of moving the element back to the beginning of the list, is implemented in only a few lines broken into two cases. Case one, if the list was previously empty, i.e, the pool was out of memory, we simply set the head of the list to point to this newly added element, and set the new element’s next pointer to null. Case two, if the list was not empty, we set the head of the list to point to the newly added element, and set the new element’s next pointer to the address of the previous first element of the list, i.e, the element that was pointed by the head of the list.
As we can see, the pool allocator data structure is a great tool to have in our toolbox as game developers, it’s fast, the allocation and freeing operations are both O(1) functions, easy to implement, and it bypasses a very common problem in memory management, that of memory fragmentation. In a nutshell, memory fragmentation happens when, after lots of allocations and deallocations of different sizes, the memory becomes fragmented, i.e, free and used blocks of different sizes, interspersed throughout the memory. The problem with a fragmented memory, is that sometimes the allocator is unable to satisfy a request, because there is no free contiguous block of the requested size, only in fragmented smaller chunks, so we basically run out of memory before being actually out of memory. Because the pool allocator only allocates and frees elements of the same size, the allocator can never fail due to a lack of a large enough contiguous free block.
For more information on memory management i recommend  .