Assigned: 4/27/2007
Due: 6:00pm, 5/10/2007
In Lab 3 the pipelined processor was designed to work with independent instructions. This sorely limited the types of programs that could run successfully, or required solutions like inserting nop instructions to separate dependent instructions. Unfortunately such dependencies are common and often unavoidable. This lab will focus on hardware techniques to recognize and deal with dependencies. Not all dependencies can be handled easily, so this lab also introduces the idea of inserting bubbles to deal with branch and load hazards.
Background:
The pipeline from Lab 3 made no concessions for pipeline hazards. A hazard is a case where the next instruction cannot execute on the next clock cycle. There are three broad categories of hazards: structural hazards, control hazards, and data hazards. (See COD:3e 375-383 for more).
This lab will take place in the same workspace as the previous labs. The files for this lab are provided as an archived design. You will need to restore the design, add it to the workspace, and copy several design files from lab3 into the new design. Download the archived design lab3-debug_dist.zip and follow these steps:
At this point the files are all available. Now we need to tell Active-HDL about the new design.
The pipeline registers that we added in Lab 3 made it possible to decrease the clock period because the amount of work in a given cycle decreased. Concomitantly, instructions now take 5 cycles to complete. This means that the values available in the register file may be "stale". This happens because the result of an operation is not available to later instructions until the instruction reaches the Write-Back (WB) stage.
The solution to this problem is to pass the "fresh" values for registers back to earlier stages. These "fresh" values can then replace the "stale" ones from the register file. This practice is called Forwarding or Bypassing (we'll use Forwarding ). (See COD:3e 402-412 for more)
Consider the following code sequence:
add $5,$3,$4
add $6,$5,$6
add $7,$7,$5
Using the processor from Lab 3 this would run into two issues, because the second and third add instructions would read the old value for $r5. When the second instruction is in the Execute stage, the new value of $r5 is stored in the EX/MEM register. So we need a way to use that value for the computation of $r6. When the third add instruction is in the Execute stage, the updated value of $r5 is stored in the MEM/WB register. This means we'll need a multiplexer on the A input to the ALU to choose between the new value of $r5 and the one read from the register file.
The fundamental idea in forwarding is choosing the most recent value for a specific register. In hardware a choice is synonymous with a multiplexer. In this case the multiplexers used to compute the return address in the JAL and JALR instructions will be replaced by larger versions that allow more choices.
The multiplexers allow the most recent value to be used. Determining when this applies is slightly more complex. Use the provided skeleton file forwardingunit.v and add logic to compute the control signals Fwd_EX_RS, and Fwd_EX_RT. Don't forget that the JAL and JALR computation also uses the same multiplexers.
Each instruction that updates the register file specifies its destination register. If that destination register matches one of the source registers of the next two instructions, then there's a potential conflict. Other things to keep in mind:
To verify the correctness of your implementation, use the script phase1a_tf_runtest.do in the Tests directory. Before you do, make the following change:
The dependencies between computational instructions were handled in part A, but branch instructions and the JR and JALR also suffer from hazards. For the branches, it's essential that the correct values are compared to determine the outcome. For JR and JALR, the issue is jumping to the correct destination. In most cases, we can avoid branch hazards by forwarding (the other cases are addressed in Phase 2). The new data must be forwarded to the ID stage.
The logic is similar to execution forwarding logic, but less complex. The reason is that the register file is write-first, meaning that values in the WB stage are written to the register file before the outputs from the register file are read in the ID stage. This means that values in the WB stage are available in the ID on the same cycle, so there is no need to forward to the ID stage from the WB stage.
The detection of forwarding in the branch case is virtually identical to the process for the execution forwarding. Specific issues:
Finish implementing the forwardingunit to produce the Fwd_ID_RS and Fwd_ID_RT control signals.
To verify the correctness of your implementation, use the script phase1b_tf_runtest.do in the Tests directory.
This should be a formality if all the names have been kept correctly.
The forwarding logic addresses most of the problems caused by dependencies, but there are still two cases that are two expensive to handle with forwarding. Instead, the dependent instructions waits in the ID stage until the forwarding mechanism is able to take effect. To enforce this delay we introduce a bubble into the pipeline.
A bubble serves as a place-holder and does not update any visible state. The instruction nop has the same behavior, but there is an essential difference. The difference is that nop is specified by the programmer or compiler, while a bubble is dynamically inserted in the pipeline to preserve proper execution behavior. Specifically, the pipeline inserts a bubble when no forwarding path exists.
Inserting a bubble requires two things:
The only ESSENTIAL changes to insert a bubble are to disable RegWrite and Store. These signals control the persistent state in the processor. For clarity we will make a more dramatic move and zero the entire IDEXReg
The pipeline registers and PC all have a signal called LoadEnable. When this signal is set, the register updates on the rising edge of the clock. In Lab3 these signals were all wired to logical 1.
The logic we added in Phase 1 passes ALU results to later instructions to prevent errors caused by dependencies between instructions. This mechanism eliminates the majority of conflicts between neighboring instructions, but fails when later instructions depend on a load instructions. Consider the following code:
lb $r2, 0($r9)
andi $r3, $r2, 0xFA
...
This code loads a byte from memory, applies a mask (via the andi) and then branches based on the result. The problem is that the standard forwarding path from the MEM to EX stage passes the ALU result. When the load is in the MEM stage, the ALU result is the address of the load, not the resulting value.
The solution is to insert a bubble between the Load instruction and the dependent operation. Inserting a bubble means that the lb is in the WB stage ( with result in the MEM/WB register ) when the andi reaches the EX stage. At that point the standard forwarding mechanism takes over.
The hazardunit.v is a skeleton for a hazard detection unit. If a hazard is detected, the LoadEnable signal goes low, and the IDEXReg_Clear signal is asserted.
The branch decision takes place in the ID stage to remove the need for branch prediction. The problem with this decision is that it introduces a hazard. Consider the following code:
add $r3, $r2, $r3
add $r2, $r4, $r6
beq $r3, $r5, success
...
add $r2, $r2, $r3
beq $r3, $r2, success
The first branch will complete properly because the both $r3 can be forward from the MEM stage to the ID stage. The second branch will fail because there is no forwarding path from EX to ID. The solution is to detect the second case and insert a bubble. This example illustrates the most common branch hazard.
Unfortunately, branches are also affected by the load hazards covered in Part B above. Consider the following code:
lw $r4, 0($r6)
bne $r4, $0, loop
...
This sequence of code requires two bubbles to be inserted. The base branch hazard accounts for the first bubble. That leaves the branch in the ID stage, while the Load is in the MEM Stage. The new value of $r4 is still coming from memory, but the bne need $r4 to make the branch decision. The solution is to insert a second bubble and wait an additional cycle. At that point the Load is in the ID stage, and $r4 is available to the branch via through the register file.