In this blog post, I’ll be talking about the steps I took to extend the MIPS single-cycle processor into a 5-stage pipeline.
Part 1: Building a MIPS single-cycle processor in Verilog
Part 2: Building a MIPS 5-stage Pipeline processor in Verilog
Part 3: Running the MIPS 5-stage Pipeline processor on a DE10-Nano FPGA
Table of contents
- Adding the pipeline registers
- Adding the forwarding functionality
- Adding the Load Word data hazard handler
- Adding the Branch data dependency handler
Adding the pipeline registers
In my previous blog post, I went through the steps I took to build a MIPS single-cycle processor in Verilog, test on ModelSim, and implement a BNE instruction.
A MIPS pipeline consists of 5 stages, Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory (MEM), and Writeback (WB). It has 4 pipeline registers as shown below,
The complete MIPS 5-stage pipeline processor design with the controls looks like the following,
To extend the MIPS single-cycle processor into a pipeline, I first added the pipeline registers. (I removed the PC MUX for now) Note that this 5-stage pipeline cannot use the branch instruction, and does not resolve hazards.
IF_ID IF_ID(clk, instr, pcplus4, instrD, pcplus4D);
ID_EX ID_EX(clk, instrD, regwrite, memtoreg, memwrite, branch, alucontrol, alusrc,
regdst, srca, writedata, instrD[20:16], instrD[15:11], signimmD, pcplus4D,
regwriteE, memtoregE, memwriteE, branchE, alucontrolE,alusrcE,
regdstE, srcaE, writedataE, rtE, rdE, signimmE, pcplus4E, instrE);
EX_MEM EX_MEM(clk, instrE, regwriteE, memtoregE, memwriteE, branchE,
zero, aluout, writedataE, writeregE, pcbranch,
regwriteM, memtoregM, memwriteM, branchM,
zeroM, aluoutM, writedataM, writeregM, pcbranchM, instrM);
MEM_WB MEM_WB(clk, instrM, regwriteM, memtoregM, aluoutM, readdata, writeregM,
regwriteW, memtoregW, aluoutW, readdataW,writeregW, instrW);
I used the following instructions to test the implementation, I prepared the instructions in a way no forwarding is required,
addi $s1 $zero 1
addi $s2 $zero 2
addi $s3 $zero 3
In hex,
20110001
20120002
20130003
I used the following test bench,
module testbench2();
logic clk;
logic reset;
logic [31:0] writedata, dataadr;
logic memwrite;
// instantiate device to be tested
top dut(clk, reset, writedata, dataadr, memwrite);
// initialize test
initial
begin
reset <= 1; # 1; reset <= 0;
end
// generate clock to sequence tests
always
begin
clk <= 1; # 5; clk <= 0; # 5;
end
endmodule
I used run
to run 10 ps at a time (the clock period is 10 ps). I used the $display()
to output the program counter, and show which pipeline stage the instruction is in within the console.
The SystemVerilog code for the 5-stage pipeline MIPS processor with only the pipeline registers can be found here:
Adding the forwarding functionality
The next step was to allow hazards to be resolved. I added a forwarding functionality,
I added an additional multiplexor that is controlled by the jump
signal (red multiplexor on the diagram above).
mux_dontcare pcmux(pcnextbr, {pcplus4[31:28], instrD[25:0], 2'b00}, jump, pcnext);
There are also 2 new multiplexors that use the Hazard unit control signals,
mux_dontcare3 muxsrca(srcaMUX, result, aluoutM, forwardAE, srcaE);
mux_dontcare3 muxwritedata(writedataMUX, result, aluoutM, forwardBE, writedataE);
The hazard unit looks like the following,
hazardunit hazardunit(regwriteM, regwriteW, rsE, rtE, writeregM, writeregW,
forwardAE, forwardBE);
always_ff @(forwardAE)
begin
case(forwardAE)
2'b01: $display("Forwarded %h to srcaE from MEM/WB stage", result);
2'b10: $display("Forwarded %h to srcaE from EX/MEM stage", aluoutM);
endcase
end
always_ff @(forwardBE)
begin
case(forwardBE)
2'b01: $display("Forwarded %h to writedataE from MEM/WB stage", result);
2'b10: $display("Forwarded %h to writedataE from EX/MEM stage", aluoutM);
endcase
end
endmodule
I prepared the test instructions to test the forwarding functionality,
- The EX result of the
addi $s1 $zero 1
instruction will be forwarded to the EX stage of theaddi $s2 $s1 2
instruction as it uses$s1
- The MEM result of the
addi $s1 $zero 1
instruction, and the EX result of theaddi $s2 $s1 2
instruction will be forwarded to the EX stage of theaddi $s3 $s1 $s2
instruction as it uses both$s1
and$s2
ADDI $s1 $zero 0x1
ADDI $s2 $s1 0x2
ADD $s3 $s1 $s2 //$s3 should have 4 in the end
In hex,
20110001
22320002
02329820
On each run
, it runs one clock and also outputs the forwarding information. In the end, $s3
is 4, so this was confirmed to run as expected.
The SystemVerilog code for the 5-stage pipeline MIPS processor with the forwarding can be found here:
Adding the Load Word data hazard handler
The lw
(load word) data is only available after the MEM stage. Therefore, we must add a stall functionality to the hazard unit.
I prepared instructions that contain lw
to test the load word stall functionality as well as forwarding,
- The instruction
lw $s0, 0x4($zero)
will forward the MEM result to the EX stage ofadd $t0, $s0, $s1
as it uses$s0
. This instruction is right after thelw
, so a stall cycle is inserted between the ID and EX stages. - A stall cycle is also inserted between the IF and ID stage of the
add $t1, $s2, $s0
ADDI $s1 $zero 0x1
SW $s1 0x4($zero)
ADDI $s2 $s1 0x2
LW $s0 0x4($zero)
ADD $t0 $s0 $s1
ADD $t1 $s2 $s0
ADD $t2 $s0 $s2
In hex,
20110001
AC110004
22320002
8C100004
02114020
02504820
02125020
The output on ModelSim shows that lwstall
occurred on cycle 6 ( lwstall=1
), and in the end, $t2
is 4, which confirms that the load word stall functionality works as expected.
The SystemVerilog code for the 5-stage pipeline MIPS processor with the load word stall functionality can be found here:
Adding the Branch data dependency handler
To evaluate branches in the ID stage, we need to add a branch data dependency handler. Before the ID/EX register, an Equality unit is added.
The equality module in SystemVerilog looks like the following,
module equal(input logic [31:0] srca, writedata,
output logic equalD);
logic zero;
always_comb
begin
zero = srca-writedata;
if(zero==1'b0) equalD = 1'b1;
else equalD = 1'b0;
end
endmodule
I prepared instructions that contain beq
to test the branch dependency handler,
- The instruction
beq $t0 $s2 0x4
result is not taken - The instruction
beq $t0 $t1 0x5
result is taken, so the instructionaddi $s0 $zero 0x1
will be flushed. It will jump to the instructionaddi $s1 $zero 0x5
ADDI $t0 $zero 0x1
ADDI $s2 $zero 0x2
SUB $t1 $s2 $t0 //$t1 should have 1
BEQ $t0 $s2 0x4 //not taken
BEQ $t0 $t1 0x5 //taken
ADDI $s0 $zero 0x1
ADDI $s0 $s0 0x1
ADDI $s0 $s0 0x1
ADDI $s0 $s0 0x1
ADDI $s0 $s0 0x1
ADDI $s1 $zero 0x5 //BEQ will go to here
In hex,
20080001
20120002
02484822
11120004
11090005
20100001
22100001
22100001
22100001
22100001
20110005
The output on ModelSim shows that Branch not taken
occurred in cycle 5, and Branch is taken
occured in cycle 6. In the end $s1
is 5, and $s0
is not updated, which confirms that it runs as expected.
The SystemVerilog code for the 5-stage pipeline MIPS processor with the branch dependency handler can be found here:
In the next part, I will talk about running the MIPS pipeline processor on a DE10 Nano FPGA.
Part 1: Building a MIPS single-cycle processor in Verilog
Part 2: Building a MIPS 5-stage Pipeline processor in Verilog
Part 3: Running the MIPS 5-stage Pipeline processor on a DE10-Nano FPGA