ARM64 System Memory

ARM AArch64: Shareability domains and Normal memory

Om Narasimhan
5 min readDec 19, 2019

This article explains the concepts of Shareability for normal memory and touches upon its impact on overall Coherency and Cacheability on an AArch64 System design. Parts of these points hold good for non-AArch64 System design as well.

In most systems, normal memory constitutes a major portion of the addressable memory. The most important factor to claim a specific memory as normal memory is its being idempotent. That is, the memory should exhibit all of the following properties:

  • No side effects for read operation and repeated read operations.
  • No side effect for write operation and repeated write operations to the same address location.
  • Multiple accesses can be merged without side effects for the same type (Read/Write) of access.
  • Read operation returns the last written value irrespective of the number of times read operation is performed.
  • Read operation can fetch additional memory contents without side effects.
  • Unaligned access can be supported.

In addition to being idempotent, an implementation of Normal memory exhibits the property of the write operation being completed in finite amount of time.

  • This means that, within that memory’s shareability domain, an observer should see the write completed in finite amount of time without the need of using explicit cache maintenance instructions or memory barriers.
  • If the normal memory is accessed as non-cacheable memory, then the completed write should be observed globally by all observers in finite amount of time without the need of using any cache maintenance instructions.

Shareability domains for Normal memory

Shareability attribute of a memory location is something the hardware enforces for specific coherency requirement of that memory location

A normal memory location can have a shareability attribute like:

  • Inner shareable
  • Outer shareable
  • Non shareable

Please note that these attributes define only the data coherency requirements. These do not define Instruction fetch coherency requirements

Inner shareable domain is defined, for a normal memory location with Inner Shareable attribute set, as

  • One or more of observers that are coherent in data access to this location.
  • Each of these observers are independently coherent in data access to this location.

Outer shareable domain is defined, for a normal memory location with Outer Shareable attribute set, as

  • One or more of observers that are coherent in data access to this location.
  • Each of these observers are independently coherent in data access to this location.

In addition to these definitions, some additional properties of the shareability domains are:

  • Each observer is only a member of a single Inner Shareability domain.
  • Each observer is only a member of a single Outer Shareability domain.
  • All observers that are members of an Inner shareability domain are always the members of the same Outer Shareability domain.

Also

  • If a location is Non-Cacheable, add data access to it is coherent to all observers. Hence Non-Cacheable locations are always considered Outer shareable to all observers.

In a system design, if multiple PEs are controlled by the same instance of a hypervisor or operating system, all those PEs are to be the member of a single Inner Shareable domain.

Example1: A system design with Asymmetric Multi Processing

Asymmetric Multi Processing is where the different PEs run different Operating systems or Hypervisors.

For illustration, let us assume a fictitious AArch64 implementation with the following components

  • Multi socket/CPU system with 4 PEs per socket.
  • Each PE has a small amount of addressable on-chip-memory that is banked per PE, giving each PE access its local OCM using the same address range.
  • Each A53 PE has independent L1 cache.
  • Each Socket has L2 cache shared among A53s.
  • An FPGA implementation of a kind of RAM is connected to the system. All PEs have access to this FPGARAM.

This system is designed for AMP where PE (0–3) always run an instance of a hypervisor/OS and PE4–7 always run another instance of a hypervisor/OS, the shareability domains could be

Memoty Attributes of an AMP system

In the illustration,

  • Each PE’s OCM is not shareable (OCM is aliased and accessed as the same address range in all PEs), but cacheable.
  • Socket-0 and connected DDR0 are in a single inner shareable domain. Socket 0 runs a Hypervisor.
  • Socket-1 and connected DDR1 are in a single inner shareable domain. Socket 1 runs a bare-metal Linux instance.
  • The FPGARAM connected through the PCIe RC, with both Sockets are in the same outer shareable domain.

Illustration of the AMP system Characteristics of the AMP system

What this configuration means is

For L1 and L2 caches,

  • L1 and L2 caches fall into the same inner shareable domain. Both are kept coherent by hardware for all cached accesses except to those to OCM.

For OCM,

  • OCM addresses but must be marked Non-Shareable as they are not shared across PEs (even those of same socket). This is alright because OCM is banked per PE. There is no hardware enforced coherency for accesses to OCM because OCM cannot be accessed outside the PE anyway. They can be marked cached or non-cached.

For DDR0 and DDR1

  • If DDR is cached and marked Inner shareable, hardware enforces coherency to the local DDR accesses. No special cache instructions are required for PEs of same socket to access the local DDR contents.
  • If DDR is cached and marked Outer shareable, hardware enforces coherency to the local and non-local DDR accesses. No special cache instructions are required for a PEs to access the DDR contents irrespective of the DDR locality.
  • If DDR is cached and marked both Inner and Outer shareable, hardware enforces coherency to the local and non-local DDR accesses. No special cache instructions are required for a PEs to access the DDR contents irrespective of the DDR locality.
  • DDR memory cannot be marked non-shareable (if done so, unexpected behavior may result).
  • If DDR is uncached, it is treated outer shareable.

For FPGA RAM

  • If FPGARAM is cached and marked Outer shareable, hardware enforces coherency to the FPGARAM accesses. No special cache instructions are required for a PEs to access the FPGARAM contents.
  • FPGARAM should not be marked Inner shareable as it does not belong to an Inner shareable domain.
  • If FPGARAM is uncached, it is treated as Outer Shareable.

Example2: A system design with Symmetric Multi Processing

In Symmetric multi processing, one instance of the operating system or hypervisor runs on all the PEs. Such a situation mandates all PEs and the memory accessed by them fall into the same Inner shareable domain.

Memoty Attributes of an SMP system

Illustration of the SMP System Characteristics of the SMP System

Here, the only difference from AMP design with two inner shareable domain is that, if the DDR are cached, for any PE, no cache management instructions are required to coherently access non-local DDR contents as local as well as non-local RAM falls into the same Inner shareable domain.

Note: This is just one possible design. For example, there is nothing that prevents the system architect from extending the Inner Shareable domain to include the FPGA Memory.

— End of article

--

--