Introduction to CPU Microarchitecture

Ruban S
3 min readJul 13, 2024

--

In the previous article, I wrote about the Instruction Set Architecture (ISA), and its importance. Once you decide on which ISA you want your CPU to support, the next big design challenge is the microarchitecture. A CPU microarchitect comes up with a particular microarchitecture based on the where and how the CPU will be used. As a result, different CPUs, albeit implementing the same ISA, will have different microarchitectures, because of the difference in use case.
For example, Qualcomm’s Kryo CPU (used in the Snapdragon SoC) uses the ARMv8.0-A architecture and Fujitsu’s A64FX (used in the Fugaku supercomputer) uses the ARMv8.2-A architecture. Although both architectures are very similar (with minor differences), the microarchitecture will be poles apart.

While there exist multiple microarchitectures across a wide range of CPUs, most microarchitectures tend to implement the execution of instructions by the following steps:

  1. Fetch: Every instruction resides in memory and has a particular memory address associated with it (called Program Counter, or PC for short). These instructions need to be fetched from memory into the CPU.
  2. Decode: Once the instruction is fetched into the CPU, it needs to be decoded. Decoding an instruction tells the processor lots of information about that instruction, such as:
    a. What instruction exactly it is. Is it an add instruction, or an addi instruction, or a ldinstruction.
    b. What are the source and destination operands, and how many such operands are there. Different instructions have different operands based on their function. Operands are either the architectural registers or values that are directly encoded into the instruction bits itself (called immediate values)
    c. Which execution unit should the instruction go to. Given that there exist different types of instructions and operations, modern CPUs have different execution units. A lot of ALU operations are typically issued to the ALU execution unit. Load/store instructions access the memory and perform address calculation, which is why it would have its own execution unit. FP instructions perform completely different and complex operations and hence there exists separate execution units for them as well.
  3. Execute: This is when the processor actually executes the instruction by taking the source operands (if any) and passing it to the relevant execution unit.
  4. Memory Access: Load instructions will read data from memory, and store instructions will write data to memory.
  5. Write-back: Almost all instructions will update an architectural register.
At a high level, the CPU does these five main operations in a loop

To implement a processor at the logic gate level, we will need a clock input. Given that the processor will have multiple wires, signals and registers, we need to ensure that a signal is not read from and written to at the same time, as this would result in ambiguity and an inconsistent internal state. For this, we’ll stick to an edge-triggered clock. By doing this, we ensure that the state is updated only on every clock edge. The operation that updates the state will have one full clock cycle to perform the operation before it can update the relevant components of the state.

In the next few posts, we’ll look at designing a very simple CPU that supports basic RISC-V instructions using logic gates and blocks.

--

--