Primer to RISC-V

Ruban S
6 min readJul 6, 2024

--

RISC-V is an open-source RISC-based ISA that was started in 2010 at the University of California, Berkeley.

The motivation to develop an open-source ISA was to promote contributions (mainly from academic institutions) that is royalty-free, as well as having a simpler ISA that does not require overengineering the microarchitecture. Over time, there has been multiple contributions to the RISC-V project, in terms of ISA development, toolchain development as well as the software ecosystem. RISC-V is now managed by the non-profit organization, RISC-V International.

Given that RISC-V is a RISC-based ISA, and that one of the collaborators was David Patterson (co-author of the popular book ‘Computer Architecture: A Quantitative Approach’), you’ll find lots of similarities between MIPS and RISC-V. Similar to MIPS, RISC-V follows a register addressing mode and has the same types of instructions.

Registers and Calling Convention of RISC-V

Similar to every other ISA, RISC-V also has a set of architectural registers. Architectural registers are the set of registers that are visible to the programmer and define the architectural state of the CPU. This is in contrast to physical registers, that are hidden from the programmer and are solely a microarchitecture feature. More on that in an upcoming blog.

RISC-V has evolved a lot over the years, and so have the types of registers used. The following are the register types in RISC-V:

  1. GPRs (General-Purpose Registers): These registers (more commonly known as Integer Registers) store data in signed/unsigned integer format. A lot of data that is processed in computers are processed in integer format, hence the use of integer registers.
  2. FPRs (Floating-Point Registers): These registers store data in floating-point form, and conform to the IEEE-754 standard. A lot of scientific, financial and graphical calculations use floating-point arithmetic. IEEE-754 in itself is a huge and very interesting topic, and it deserves an entire series to itself, something I plan to do in the future 🙂
  3. CSRs (Control and Status Registers): These registers are used for various system-level operations such as mode-switching, privileged access, performance counters, dealing with exceptions and so on.

Integer Registers

RISC-V has 32 integer registers, which are named x0x31. x0 is the only integer register that is a read-only register, and is hardcoded to value 0. This is done to simplify a lot of instructions as well as simplify the hardware implementation.

While there is no compulsion on using certain registers for certain operations, there is a convention used in RISC-V programs (and followed by the compiler when generating RISC-V executables):

RISC-V Register Calling Convention

The encoding of integer registers is straightforward: Register x{n} has encoding n in binary

Floating-Point Registers

There are 32 FPRs (f0f31), where each register is of size 32 bits (if using the F extension) or 64 bits (if using the D extension). Each register stores values that conform with the IEEE-754 standard.

Control and Status Registers

RISC-V has a lot of CSRs, categorized by privilege level. Privilege level is a slightly advanced topic that requires a fair understanding of the basics of Operating Systems, and I plan to write a post on this sometime in the future. Each CSR in RISC-V has an encoding associated with it, in addition to a privilege. Here’s an example of how some CSRs are defined:

Currently allocated RISC-V machine-level CSR addresses.

Types of RISC-V Instructions

Prominent types of RISC-V Instructions
  • Register/Register instructions: These type of instructions have source and destination operands as registers only. Every instruction of this type has at most 2 source operands (two source registers), and one destination register.
    A lot of arithmetic and logical instructions are register/register type, since a lot of arithmetic and logical operations (add, subtract, multiply, divide, and, or, xor, etc.) are binary operations.
  • Immediate-type instructions: These types of instructions have one of the operands encoded directly in the instruction itself (which is known as ‘immediate value’, since it is available “immediately” without having to read the register file).
    A lot of instructions in real-world applications tend to have atleast one constant operand. Take the for-loop in C for example:
    for(int i = 0; i < 10; i++) {sum += i;}
    In this case, we increment i by 1 in each iteration. The best way to do it in RISC-V assembly would be by using addi x10, x10, 1.
    Load instructions also fall under immediate-type. A typical load instruction would look like lw x11, 0(x2). This instruction says that the CPU should load the data from memory address x2 + 0 and write it to register x11. Since the instruction is a lw (load word), this instruction would load 32 bits (4 bytes / 1 word) of data from memory. Since the start address is x2 + 0, and memory is byte-addressable, the end memory address would be x2 + 3 .
  • Store instructions: Like load instructions, store instructions also deal with data. While load instructions read from memory, store instructions write to memory. A typical store instruction would look like sw x11, 0(x2) . This instruction says that the CPU should read the data from register x11 and write it to memory address x2 + 0. Since the instruction is a sw (store word), this instruction would write 32 bits (4 bytes / 1 word) of data into memory. Since the start address is x2 + 0, the end memory address would be x2 + 3.
    Even though load and store instructions access memory, they are considered different types of instructions. This is because when processing a load instruction, the CPU reads only one register (in our example x11) to obtain the destination where it needs to write the data to. However when processing a store instruction, the CPU needs to read two registers (in our example, it reads x11 to obtain the data it needs to write to memory, and it reads x2 to figure out which memory address to write to).
  • Branch instructions: Branch instructions allow the processor to change the current execution flow, based on a condition. Branch instructions are found when the user writes conditional code in high-level languages, such as if statements, while statements, for statements, and even ternary conditional expressions.
    Taking the example mentioned below: The instruction at PC 0x100c4 ( blt a1, a2, 100bc) checks whether the value in a1 is less than the value in a2, and if so, transfers the instruction flow to 0x100bc. Essentially it changes the PC from 0x100c4 to 0x100bc.
    Branches usually jump a few instructions forward or backward from the current PC. As a result, it is more convenient to encode the relative difference between current PC and target PC into the instruction, rather than encoding the entire PC into the instruction.
  • Upper immediate type instructions: Upper immediate type instructions are a special class of instructions which deal with comparatively larger data. These instructions write data to the upper bits of the destination register while the lower bits remain untouched.
    An example of this type of instruction is the lui instruction, used to load a larger value into a register without the use of shift operations. Another example is the auipc instruction. If the instruction at PC 0x80001fc0 is auipc x2 , the value 0x80001000 is written to x2. This is widely used to load addresses into registers, which are a result of reading from/writing to variables in high-level languages
  • Jump instructions: Like branch instructions, jump instructions also change the current execution flow. However, they don’t depend on any conditions, and unconditionally jump to the target address. Jump instructions are used in function calls, where a jump to the function address space does not depend on any condition.

Now that you have a fair idea about assembly and RISC-V, you’re ready to dive into the fundamentals of microarchitecture!

--

--