Instruction Set Architecture

Ruban S
7 min readMay 26, 2024

Instruction Set Architecture (ISA) is probably one of the most important aspects when it comes to designing a CPU.

ISA is a layer of abstraction that softwares use when utilizing underlying hardware.

Formally, the ISA defines the various instructions, registers and memory addressing modes that the processor would support, at a hardware level. Software created to run on an ISA would be designed to use the ISA’s instructions, register and memory addressing modes to carry out its tasks.

Personally, I think of ISA as a “language” that is used to communicate with the CPU. Here’s a few points that highlight their similarity

  • Just like how there exists multiple human languages, there also exists multiple ISAs.
  • Like all human languages that have multiple people who speak the same language, you’ll find (for the most part) multiple CPUs that are based off the same ISA.
  • Just like how commands given to humans are given through a language humans understand, commands given to a CPU (instructions) are given through a language CPUs understand (ISA).
  • When one person who understands a language wants to communicate with another person who understands a different language, they use a translator as a middle-man. Similarly, you have CPUs that execute instructions of a particular ISA, being able to run programs generated for a different ISA by the use of a middle-man (hypervisor). This is a rather advanced and elaborate discussion that needs an entire blog of its own.

Designing an ISA doesn’t just involve deciding what set of instructions to support, what registers to use and what memory addressing modes to use. Sure, these are the end products of an ISA, the part of the ISA exposed to software that runs on that ISA, but there’s much more taken into consideration when designing an ISA.

  1. When coming up with the set of instructions (and the corresponding machine codes for each instruction), one must consider how the hardware would be designed to support these instructions.
    Instructions are ultimately represented in machine code, which is a string of 0s and 1s. The underlying hardware should be designed to read these string of 0s and 1s, and deduce which type of instruction it is, the corresponding operands, and should accordingly decide which block of the CPU should handle this instruction. This complex process is known as “decoding”, and is a fundamental part of modern processors. Here’s an example:
    Say we have a RISC-V CPU that reads instruction 00000000001100010000000010110011. This is value 0x003100b3 in hexadecimal. The CPU’s “decoder” should be able to read this 32-bit instruction, figure out that this is an ‘add’ instruction, the destination register is x1, and the source operands are registers x2 and x3. Thus, the instruction would be: add x1, x2, x3.
    The RISC-V ISA is designed in such a way; the destination register is always encoded in the exact same position of every 32-bit instruction. The source operand(s) are also encoded in the exact same position of the instruction. This would ensure that when the decoder reads the source and/or destination registers, it would always read from the same position, hence there wouldn’t have to be extra hardware logic to figure out which part of the instruction has the register(s) encoded. Lesser hardware logic would, by default, lead to lesser power consumption.
  2. Continuing with the previous example; if you look up the RISC-V definition of the add instruction, you’ll see that there’s 5 bits to encode one register (source and/or destination). This would imply that there’s a maximum of 2⁵ = 32 registers (commonly known as architectural registers) available as part of the ISA. Increasing or decreasing the number of architectural registers would mean more/less bits to encode one register, which would increase/decrease the size of the instruction.
  3. Increasing/decreasing the number of instructions part of the ISA would also increase/decrease the instruction size.

There are two main bifurcations of ISAs:

Complex Instruction Set Computer (CISC)

From a technical standpoint, CISC is used to describe ISAs where a single instruction is capable of executing more than one underlying operation in the CPU (typically known as micro-op or microcode).

A good example is the REPNE MOVSB instruction in x86. This is a single instruction in x86 that repeatedly copies one byte of data from source address (specified in DS register) to destination address (specified in DI register). This process involves multiple operations, such as; obtaining the byte of data from source address, writing the byte of data into destination address, incrementing source address, incrementing destination address, and repeating this operation as long as the value of register CX is non-zero. Each of the above operations would be single micro-ops in the 8086 CPU, but would be performed all at once when this one x86 instruction is executed.

Given the complexity of CISC instructions, different CISC instructions have different instruction lengths. Taking the same x86 ISA as an example, the INC instruction is only 1 byte long (this instruction just increases the value of a register by 1), whereas the instruction MOV [BX+DI+1234H], 5678H takes up 6 bytes (this instruction first obtains the address by adding the contents of BX and DI, then adding 0x1234 to that value, then writes value 0x5678 to that address). Having different instruction lengths allows for more compact code size (since simpler instructions have smaller lengths, as well as using complex instructions to replace multiple simpler instructions). However, the downside is that the decoder for such an ISA would be rather complex. The decoder will have to implement logic to determine which byte is the final byte of an instruction, which is easier said than done.

Reduced Instruction Set Computer (RISC)

Given the complexity of CISC, there was an imperative need to design a simpler ISA. Hence, many RISC ISAs came forward. RISC ISAs have instructions that execute as less micro-ops as possible (1 at the least). This ensured that instructions have a fixed length, meaning that the decode logic would be much simpler. On the flip side, this would increase the number of instructions of the program executed.

The MIPS ISA is one of the most well-known RISC ISAs, and is widely taught in many universities as part of their Computer Architecture courses, since MIPS is regarded as a blueprint for designing efficient RISC ISAs. MIPS instructions are 32 bits long (4 bytes) and are divided into 3 types of instructions:

  1. R-type instructions: R-type instructions (R for register) are instructions where all the source operand(s) are registers.
  2. I-type instructions: I-type instructions (I for immediate value) are instructions where one of the operands is a constant value that is encoded in the instruction bits itself (immediate).
  3. J-type instructions: J-type instructions (J for jump) are instructions that change the flow of execution of the program, and are used to implement conditional logic.

MIPS follows a register addressing mode, where the address is stored in a register. Any value to be written to/read from memory has the corresponding address stored in a register. Load instructions read values from memory and store instructions write values to memory. The simplicity of the addressing mode in MIPS also adds to the simpler hardware implementation of the MIPS ISA.

RISC vs CISC has been an age-old debate. However, the bifurcation between RISC and CISC has slowly faded over time, with the introduction of extensions to RISC ISAs such as ARM and RISC-V.

Common and well-known ISAs

  1. x86: x86 was the ISA implemented for Intel’s 8086 microprocessor, which was expanded by a lot for future products like the 80386 and Pentium series, and now used in almost all modern Intel CPUs, ranging from commercial CPUs like the i7 series, to the workspace-grade CPUs like the Xeon series.
  2. ARM: The ARM ISA is well-known as an extremely power-efficient RISC ISA, marking its dominance in the phone and tablet market, where power-efficiency takes precedence over compute. ARM CPUs can also be compute-intensive, with some of the most powerful supercomputers being powered by ARM.
  3. MIPS: MIPS has historical significance and is part of well-known products like the Playstation, Playstation 2, Playstation Portable (PSP) and Nintendo 64. MIPS has also been used in various commercial routers and set-top boxes. While the MIPS ISA is officially discontinued, it is still taught in many universities.
  4. PowerPC: PowerPC is an RISC ISA developed by IBM, as part of the Apple-IBM-Motorolla alliance, and has been used in various products like the Xbox 360, PS3 and Nintendo Wii.
  5. RISC-V: RISC-V was an academic project from the University of California, Berkeley as an effort to create an open-source RISC ISA that would allow for customizations and contributions from the public. RISC-V is slowly gaining popularity
  6. SPARC: SPARC was an open source ISA developed by Sun Microsystems, that saw many innovations in the field of computer architecture. Unfortunately, SPARC was shut down after the acquisition of Sun Microsystems by Oracle.

Designing an ISA is a complex task, but designing a microarchitecture that implements the ISA is equally challenging, in its own way. The next article will be a primer to CPU microarchitecture, and will take a deep dive from there. Given that my current work is based on the RISC-V ISA, I intend to explain the concept of microarchitecture using RISC-V.

--

--