Aligned and Unaligned Memory Access

8 min readMay 19, 2024

Unaligned memory access is the access of data with a size of N number of bytes from an address that is not evenly divisible by the number of bytes N. We have aligned memory access if the address is evenly divisible by N.

We can express this as Address/N, where Address is the memory address, and N is the number of bytes that are accessed. Here are some examples:

Two byte access from address 4: Address/N = 4/2 = 2 (aligned access)
Two byte access from address 3: Address/N = 3/2 = 1.5 (unaligned access)
Four byte access from address 24: Address/N = 24/4 = 6 (aligned access)

As a practical note, If the rightmost digit of the address (represented in a hexadecimal format) is divisible by the number of bytes, we have aligned memory access.

Fig. 1 Aligned and unaligned memory access based on address and access size

There are microprocessors that allow unaligned memory access and those that don’t. Unaligned access usually negatively impacts performance, as more operations (instructions) are required to perform it. If the microprocessor does not support unaligned access, an exception can be triggered (e.g., a bus error exception) when such access is attempted.

Software Point of View

From the software's point of view, memory access is just instructions for reading or writing bytes of data to or from memory.

Let’s look at practical situations where unaligned access may occur when using the C programming language. First, we start with the structure shown below:

struct Example {
   uint16_t data_1;
   uint32_t data_2;
   uint8_t data_3;
};

Let’s say the structure shown in the code above is mapped starting from address 0x00001000. This means that data_1 occupies addresses 0x00001000 and 0x00001001 (Note: each address can store a single byte). Variable data_2 occupies addresses 0x00001002 to 0x00001005 and data_3 is at address 0x00001006. The variable data_1 has a size of two bytes, and using the simple calculation Addr/N, we can see that it is properly aligned. Variable data_2 is not aligned, and variable data_3 is aligned. Here, we should mention that single-byte variables are always aligned because all addresses are evenly divisible by one. If we want all members of the structure properly aligned, the compiler can do that job for us. It can insert so-called “padding” bytes after data_1 so we can have data_2 aligned properly. The compiler (if configured) can place all variables and function arguments in an aligned manner, complying with the alignment requirements of the CPU architecture that is used.

The next possible situation where we can encounter unaligned memory access is casting pointers from one type to another. Although the C language allows such castings, the result may cause undefined behavior:

“A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.” (The C Standard, ISO/IEC 9899:2011)

void test_func(uint8_t *data) {
        /*The rest of the code removed for clarity*/
 uint32_t value = *((uint32_t *) data);
}

As we can see from the code above, we have read access of 4 bytes from a memory address that is passed as a function parameter uint8_t * data. This memory access can be aligned or unaligned, and it all depends on the address of the variable pointed by the data pointer. For example, if we pass a variable with address 0x0004 as an argument to the function, we will end up with aligned access; if the address, however, is 0x0005, then the access will be unaligned. This is a situation in which the compiler can’t help us resolve, as it does not generate code for run-time checks.

Compiler Specifics

The C programming language classifies unaligned memory access as undefined behavior. The default behavior of the compiler when it comes to unaligned access is dependent on the target CPU architecture. If the architecture does not allow unaligned accesses, then the compiler will place all variables, functions, etc., in an aligned manner. If the CPU architecture allows unaligned access, then the compiler should have options where we can select whether it should take advantage of this or not. For example, gcc compiler has the following options for ARM processors that can be used: -munaligned-access -mno-unaligned-access.

If we look at the example with the structure from the previous chapter and we don’t want to have padding bytes inserted by the compiler, we can explicitly instruct it by using a compiler-specific language extension. For gcc, if we want to “pack” (remove padding bytes) a structure, the code will look like this:

struct __attribute__((packed)) Example {
   uint16_t data_1;
   uint32_t data_2;
   uint8_t data_3;
};

The attribute packed specifies that a type must have the smallest possible alignment. C objects that can be “packed” include unions, pointers, and structures.

Hardware Point of View

Analyzing the hardware point of view will give us a better understanding of why unaligned memory access can happen in the first place. It is not a design flaw of the microprocessor. The limitations regarding unaligned memory access are related to the way memories are structured and integrated in CPU-based systems.

A memory can only access a limited number of bits in a single read or write cycle. For example, a memory with an 8-bit data bus limits a single read/write access to that size. A memory with a 32-bit data bus limits the single access to a maximum of 32 bits. The same logic applies to memories with other data bus sizes.

Fig. 2 Simplified pinout of memory units

In Fig.2, we can see two memory units with their pinout:

Memory 2k x 8 — This memory has a total of 2k (2048) addresses, selectable by bus A (lines A0 to A10). Each address can store a single byte accessible by bus D (lines D0 to D7).
Memory 2k x 32 — This memory has a total of 2k (2048) addresses, selectable by bus A (lines A0 to A10). Each address can store 4 bytes (32 bits) accessible by bus D (lines D0 to D31).

Looking at these memories as standalone units, there is no such thing as aligned and unaligned access. The available addresses start at 0 and go up to 2047. The issue with unaligned access comes into play when we integrate these memories into a larger system and map them into that system’s address space.

If we use memory with an 8-bit data bus in a 32-bit system, we cannot access the natural 32-bit data size using single-cycle access. We will have to access 4 consecutive addresses from memory (each 8-bit) so we can construct data with a 32-bit size. If we choose a 32-bit memory (e.g., 2k x 32 memory shown in Fig.2), then we can have 32-bit single-cycle accesses. Smaller access sizes (e.g., 8bit, 16bit) are also possible. Individual byte enable (BW) lines can be used for write operation. For read access, the whole 32-bit data can be read, and the unnecessary bits can be discarded.

Unaligned Memory Access Example

Fig. 3 Memory (2k x 32) integrated in a 32bit system

In Fig. 3, we have a simplified example of 2k x 32 memory integration in a CPU-based system. The first thing we should note is that the memory is not directly connected to the 32-bit CPU bus. This is because memories have specific interfaces (pinout), and connecting them to a bus requires additional logic. This logic is usually implemented in a memory controller unit that takes care of all low-level timing requirements for read/write operations.

Another very important thing to consider is mapping the memory unit into the available memory map (system address space). This is implemented using decoder logic (e.g., the Interconnect unit shown in Fig.3) that takes an address from the system address space as input and decodes it to address values for the memory.

In Fig. 4 below, we can see examples of write accesses issued on the 32-bit memory bus that are decoded to write accesses for the 2k x 32-bit memory. In our example, the memory is mapped at address 0x00008000 of the system address space. We can see that four addresses from the system address space correspond to one address from the memory unit. This is due to the fact that the system address space is byte addressable (each address is 1 byte) while the memory in our example holds 4 bytes in a single address. Byte access is made possible by the use of individual byte enable signals (BW) of the memory. For example, a write operation of a byte at address 0x00008005 will be interpreted as a write operation at the memory's second byte (BW2) at address 0x001.

Fig. 4 Aligned memory access of a 2k x 32 memory in a 32bit microprocessor system

All examples in Fig. 4 show aligned memory access. Now, let’s look at situations where unaligned access can occur. We are again using the setup shown in Fig. 3. If the start address of the access and the end address (address + transfer size) from the system address space are decoded as two addresses on the side of the 2k x 32 memory, then we have unaligned memory access. For example, system addresses 0x00008004 to 0x00008007 are decoded to a single address 0x001 on the memory unit side. If however, we issue a 4 bytes write transfer at address 0x00008005 (as shown in the table below), then this will be decoded as two separate access operations on the 2k x 32 memory side. One access to address 0x001, where we will write 3 bytes, and one at address 0x002, where we will write one byte.

For the memory 2k x 32 that we used in the examples so far, we can say that it has a 32-bit word boundary. Reading or writing across this boundary is impossible in a single cycle, and such an attempt is classified as unaligned memory access. There are two approaches for handling unaligned accesses:

Break up the access into multiple accesses (as shown in Fig.5)—The memory controller can do this. The downside of this approach is the additional time required to perform the operation.
Restrict unaligned access—If such a request is made, the memory controller will return an error. This approach reduces the hardware's complexity.

Conclusion

In conclusion, we should note that unaligned memory access is not necessarily a bad thing. Many CPU architectures support it. Although unaligned memory access usually consumes more time, it allows more efficient use of the available memory.

The most common situations where you can encounter unaligned memory access are:

Casting variables to types of increased sizes
Accessing multiple bytes of data using pointers (especially when casting is involved)