Designing aligned Memory Allocator

Published in

HowsOfCoding

5 min readJan 16, 2022

In the last two blogs on memory management (malloc(), free()), we discussed how we could implement a custom allocator using a circular Free list as data structure for bookkeeping. In this blog, we will take a look at modifying the scheme to allocate blocks that start at an address that is strictly a multiple of some factor “alignment”.

Background

As outlined in this article, the computer’s processor does not read from and write to memory in byte-sized chunks. Instead, it accesses memory in two-, four-, eight- 16- or even 32-byte chunks. We’ll call the size in which a processor accesses memory its memory access granularity.

If you don’t understand and address alignment issues in your software, the following scenarios, in increasing order of severity, are all possible:

Your software will run slower.
Your application will lock up.
Your operating system will crash.
Your software will silently fail, yielding incorrect results.

Aligned malloc

For this case study, let’s assume, aligned aligned_malloc() is a function that supports allocating memory such that the memory address returned is divisible by a specific power of two. It allows the user to specify the alignment, which is required for some kinds of operations. For example, DMA (I/O via direct memory access) might require a buffer with 512 byte alignment, aligned to cache lines etc.

Requirements

Address returned by aligned_malloc() needs to be a multiple of alignment specified by the user. This means we will need to start that block at a multiple of alignment.
We could use malloc() under the hood, to achieve this “aligned” allocation.

Design

For example, you wanted to allocate 10 bytes with alignment of 16 (so the desired behavior is to allocate the 10 bytes starting in an address that is divisible with 16). So aligned malloc() will give you 10+16-1=25 bytes. Not necessarily starting at the right address in terms of being divisible with 16). But, then this 16-1 is 0x000F and its negation (~) is 0xFFF0. And, now we apply the bitwise AND like this: p + 15 & 0xFFF0 which will cause every pointer p to be a multiple of 16.
Problem with this : if we change the size of what is requested to malloc()— when a corresponding free()is called, extra bytes we added for alignment purpose during call to malloc() needs to be known.

Implementation

aligned_malloc( )

Alignment: To perform alignment using malloc() API, we need an additional of utmost (alignment-1) bytes to force it to be a multiple of alignment + x bytes < storage to hold info to get back to malloc’d address when corresponding free() needs to be performed.
Bookkeeping to perform correct free(): As shown in Fig1. one could store the offset (16-bit or 2 bytes) to computemalloc() returned address at aligned_addr - 1, (it could store malloc address directly — but this would need 32 bit or 64 bit space, depending on the platform).

Implementing alignment

There are two ways shown here :

a. Use bit-shift operation — This operation assumes alignment to be a power of two. Although bit operations help with speed, in this case, we are performing memory allocation that would use a system call within. So, the benefits of speed achieved using bit-shift may not even be noticeable. In this case, we could store the address allocated by malloc (32 bits / 64 bits ) to be used to perform free().

b.Use modulo operator — use modulo to compute the offset to move the malloc’d address to get required alignment. Also, this value can be stored using just 16-bits (instead of 32/64 bit storing address as in option a.) to retrieve malloc’d address when free() needs to be performed (as shown in code example below).

// aligned_malloc(1000, 128) will return a memory address// that is a multiple of 128 and that points to memory of size 1000 bytes.//offset_t is a uint16_t, supports up to 64KB alignment, a size which is already unlikely to be used for alignment.typedef uint16_t offset_t;#define PTR_OFFSET_SZ sizeof(offset_t)
void* aligned_malloc(size_t required_bytes, size_t alignment){int offset = alignment - 1;// we also need about 2 bytes for storing offset to get to orig malloc address during aligned_free().uint32_t hdr_size = PTR_OFFSET_SZ + (alignment - 1);if((p_addr = (void * ) malloc(required_bytes + hdr_size)) == NULL){return NULL;
}// a) bit-shift to move address to aligned_addr
//    Note that this operates on powers of twovoid* aligned_addr = (void * ) (((size_t)(p_addr) + offset) & ~(offset));//b) OR use modulo operator to get how much to move forwardint move_forward = (alignment - ((size_t)p_addr % alignment));void* aligned_addr= (size_t)p_addr + move_forward;// store 16-bit offset instead of a 32bit or 64 bit platform address.*((size_t *) aligned_addr - 1) = (size_t)(aligned_addr - p_addr);return aligned_addr;}

aligned_free( )

We should take care to not mix-upfree() and aligned_free().

If you call free() on an aligned pointer, free() will not recognize the allocation and you may crash or experience other strange effects.
Calling aligned_free() on an unaligned pointer will likely result in you reading an invalid offset value and calling free() with random data

To perform aligned_free() use the offset stored to get to malloc’d address and perform a corresponding free( ).

void aligned_free(void *aligned_addr ){/* Find the address stored by aligned_malloc() ,"size_t" bytes above the current pointer then free it using free() API.*/size_t offset = *((size_t *) aligned_addr - 1);// get to p_addr using offset and aligned_addr value and call 
// free() on it.free((void *)(*((size_t*) aligned_addr) - offset);
}

As always, if you found this blog useful, don’t forget to drop some claps (did you know? — you can add more than one clap :)) — so it can reach more folks. Thank you! Also, don’t forget to follow @howsofcoding and come along.