Project 3 Report

Team member: 陈柏成 123090015 ; 陈玥彤 123090050

Part1

A. RISC-V32I Simulator

We change the following places:

int64_t ->int32_t uint64_t->uint32_t Because RV64 uses 64-bit registers, so variables are typically int64_t or uint64_t. In RV32, registers are only 32 bits wide, so we changed it to 32 instead.
Delete lwu ld sd case(ADDIW)(SUBW)(SLLIW)(SRAW) case OP_IMM32 .Because these instructions are specific to RV64. RV32 does not have these instructions.
%llw %lld →%x %d Because %llx and %lld print 64-bit values. RV32 works with 32-bit values, so using %x and %d is correct and avoids printing garbage or truncation.
Change blockSize from 64 to 32 and block number from 32 * 1024 / 64 to 32 * 1024 / 32. Because RV32 systems typically have smaller cache line sizes.
Memory loading: return b1 + (b2 << 8) + (b3 << 16) + (b4 << 24) Because RV32 supports memory accesses up to 32 bits, we only need to combine the first 4 bytes (b1 to b4). Higher bytes (like b5 to b8) are used in RV64 for 64-bit loads, which are not applicable in RV32.

B. Fused Instructions

We changed Simulator.h and Stimulator.cpp to achieve this.

B1. R4 Instruction

Add reg3 in RegId because R4-type instructions have four operands: rd, rs1, rs2, and rs3. A third source register is required for cases like fmadd, fmsub, etc..Adding reg3 allows the decoder and simulator to store and access the rs3 register for execution.
Add case R4 in decode( ) , which implement fmadd.i fmadd.u fmsub.i fmsub.u fmnadd.i fnmsub.i . It extracts operands from all three source registers, and uses funct3 and funct2 to determine which specific instruction (fmadd.i, fmsub.u, etc.) is being used. Then it will construct a readable insist, which will be used in the execution ( ).
Add these cases in switch( inst ) in execute( ).Each case firstly sets writeReg = true so the result goes into rd. Then it will computes out using op1, op2, and op3. Finally it will increments cycleCount to simulate the latency of these operations.

B2. Disable Data Forwarding

Insimulate( ), we determine the stall number (e.g., from 2 → 1 → 0). While stall number > 0, the pipeline is paused (IF and ID stages are stalled).
- No data hazard → pass the dRegNew to dReg and execute the next instruction.
- Has data hazard → stall number -1
  
  If it is the last cycle to stall (stall number == 0 after stall number -1)→Read the register file to update the register, and execute at the next cycle. It is basically decoding the previous instruction again before execution, and getting the current value of registers this time.
In execute( )

2.1 We check for data hazards only when:
- stall number is zero .If stall number is not zero , then there the decode stage decode no instructions. Therefore, it will not encounter data hazards.
- decode stage is not bubble .Similarly, if the decode stage is bubble, the decode stage decode no instructions. Therefore, it will not encounter data hazards.
- The current instruction is not Branch or Jump type. If the instruction is branch or jump type, they will not write back to the destination register. Therefore, it will not encounter data hazards.
2.2 If there exists a data hazard, we stall the IF and ID stages for two cycles and insert a bubble at the EX stage.

The following figure shows the case when encountering data hazard in the EX stage.
In memoryAccess( )

3.1 We check for data hazards only when:
- stall number is zero
- decode stage is not bubble
- The current instruction is not Branch or Jump type
The reasons are same to the data hazard checking condition in the execution stage.

3.2 If there exists a data hazard, we stall the IF and ID stages for one cycle and insert a bubble at the MEM stage.

The following figure shows the case when encountering data hazard in the MEM stage.

Part 2 :Rearrange

A. Instruction Reordering

To reduce pipeline stalls caused by RAW (Read-After-Write) hazards, the instruction sequence was rearranged:

A1. RAW between mul a4, a1, a2 and add a3, a3, a4

Issue: a4 is written by mul and immediately read by add, causing a hazard.