CSE 378 Homework #6

CSE 378 Homework #6: Pipelined MIPS

Due: 5/24/2002

Assignment Goals & Administrivia

In this assignment, you'll develop a pipelined implementation of a subset of the MIPS ISA. Your implementation will do forwarding, as well as a primitive form of branch prediction.

Again, you may (and are encouraged to) work with a partner on this assignment. You'll make one turnin, and get the same grade. Remember that your relationship with your partner should be one of equals -- you should both share in the work and understanding required to get the job done.

Background

The MIPS subset is the same as for HW5. Remember, that MULT/DIV and SRAV/SLLV are slightly different from the MIPS spec. MULT/DIV behave like regular R-type instructions, and SRAV/SLLV have reversed their operands.

The datapath is just the one we developed in class (or used in the book), plus lots of details relating to forwarding and branches that you will have to figure out. Remember that we are implementing a richer MIPS subset than the one described in the book. For this reason, our machines will necessarily be more complicated.

You ought to be able to use the control developed in the previous assignment. Your machine should generate the control bits during the Decode stage, and store them into the ID/EX pipeline register.

Pipeline Registers

Figuring out what these should contain is a critical part of this assignment. For each stage, write down what information is calculated, or may be required for downstream stages of the pipeline. It may be easiest to design a single pipeline register that contains the union of all the information you might need. Why? Because it will make debugging easier and the SMOK ControlRegister component (which you'll use to implement a pipeline register) encourages this style of design. Obviously, a real machine wouldn't want to waste bits on unnecessary information.

Data Dependencies: Forwarding

To run efficiently (and correctly) your machine will have to implement forwarding. In particular, you'll want to forward results from the Memory stage and the WriteBack stage to the ALU. You'll also want to supercharge your register file to do forwarding for you.

Data Dependencies: Loads

Recall that loads (followed by dependent instructions) cannot be helped by forwarding. Why? Because by the time the load instruction has its result, the dependent instruction has already completed the Execute stage -- too late to make use of the value retrieved by the load instruction. In the case of loads, we'll take a conservative approach, and simply insert a bubble (nop) behind every load instruction.

Control Dependencies: Branches & Jumps

Recall that branches and jumps need to be treated with care in the pipeline. If a branch is taken, we need a way to "squash" the instructions following it (because they've already incorrectly entered the pipeline). In our implementation, we'll decide whether to take branches (and change the PC) in the Execute phase. (The book delays the setting of the PC until the Memory phase.) This means that if a branch is taken, the two instructions following the branch will need to be squashed. To squash an instruction, you'll need to zero out the instructions in the IF and ID stages.

Can we do better? Yes. We can reduce the branch penalty to one instruction (rather than two) if we move the branch logic to the decode stage. You can try this for extra credit's sake.

Register File Details

You'll have to enforce the rule that you can't write register zero. You'll also want to support fowarding around the register file. Currently, the value returned by a read on a register cycle is the value that was written in a previous cycle. We can do better, however. By adding some logic around the register file, we can detect if one or both read register numbers are the same as the write register number. If so, we'll serve up the value being written rather than the value from the register file itself.

Managing Complexity

This machine is almost an order of magnitude more complicated than the single cycle machine. Keeping the complexity under control is critical to finishing the assignment. For this reason, we suggest (require) that you agressively use containers in this assignment. Here are some suggestions for effective containerization:

Forwarding unit: This unit takes value inputs from the EX/MEM and MEM/WB registers, register number inputs from the ID/EX register, and outputs signals which are routed to the MUXes guarding the ALU input. Read the book (section 6.4) to get intimate with the forwarding problem and solution.
Branch/Jump logic: This unit takes inputs like nPC, zero, a register value (for JR), as well as a the relevant control signals. Its job is to output a control line that says whether or not to change the PC, as well as a new PC (this will depend on whether its a branch, jump, or JR). This unit has a lot of inputs, but its worth building.
Register file: As discussed above, this unit should guard against writes to the zero register, and do forwarding of same-cycle reads/writes of the same register. By doing this simple forwarding "in" the register file, your other forwarding unit is simpler.
Memory unit: This unit should encapsulate the MemoryInterface and the CharacterDevice container. Its inputs are just standard memory inputs: address, value, and control lines. Its output is just a value.

Solution Path

Below are some hints to get you towards a solution. Your main strategy should be to understand the issues on paper (and in your brain) before you start SMOKing.

(On paper) figure out the contents of your pipeline registers.
(On paper) figure out the logic of the forwarding unit.
(On paper) figure out how to handle branches/jumps (flushing the pipeline) and loads (stalling the pipeline). (Loads are easier than branches/jumps.)
Make sure you download the most recent version of SMOK!
Containerize components from your past solution that you can reuse in this project. This might feel like a waste of time, but you'll be happy you did it when you've got hundreds of components in your model. It will be difficult for us to help you if you haven't done this! This is a good place to divide work between you and your partner.
Lay out the pipeline without forwarding or stalling. This version of the machine should have just the basic components: register file, pipeline registers, PC, and ALU. (Don't worry about load/store or branch/jumps yet.) This pipeline should be able to run programs that are (heavily) padded with nops:
```
addi  $t0, $0, 10
nop 
nop
nop
add $t1, $t0, $t0
```
Get forwarding around the register file working (you may have already done this). If this is working, you can check it with a program like the above (with one less nop). Make sure this works before going on.
Now worry about the forwarding unit. Remember, it has to forward between the MEM or WB stage to the EX stage. Once this works, you ought to be able to run straight-line arithmetic code (that doesn't do any loads).
Get load-stalling working. Add the memory container, and a tiny bit of logic that checks if there is a MemRead (this checking is done during the EX stage). If so, it should disable writes to the PC (causing the same instruction to be fetched again on the next cycle) AND "freeze" the current instruction in the IF/ID pipeline register. Finally, a NOP should be injected into the ID/EX pipeline register, which can be accomplished by zeroing out the contents of that pipeline register.
Get delayed branches working. Put the logic in the EX stage, and assume a branch delay of 2 cycles. You should be able to test programs like this:
```
...
beq  $t0, $0, somewhere
nop                         # branch delay slot 1
nop                         # delay slot 2
addi $t1, $t0, $t0
...
```
After you've got delayed branches/jumps working, think about doing them correctly -- that is, the two subsequent instructions in the pipeline should be squashed if the branch is taken. To do this, you'll need to zero out the contents of the IF/ID and ID/EX pipeline registers. This has the effect of turning those instructions into NOPs.

Testing and Running

You ought to be able to use the same test strategies you used in the last homework. That is, start small and gradually build up to bigger (or even compiler-generated) programs as your confidence increases.

The Files

You're on your own for building the entire SMOK machine. Check online for any other support files. We'll probably distribute up-to-date versions of the prologue, the IO library, and other files.

Extra Credit

Tons of options. For instance, the current implementation doesn't do well on branches or jumps. In particular, their penalty is as high as 2 cycles. There are lots of ways to do better, none of them are easy, though. Jumps can be treated differently (at least J and JAL) -- they are simple enought to handle easily in the Decode stage. This means we can reduce their penalty to just one cycle. The very keen can do simple branch prediction.

A simple option is to be less pessimistic in our load strategy. Currently, we always insert a bubble after the load. We can do better by only inserting a bubble when the following instruction depends on the result of the load.

A final option concerns the following sequence of instructions

lw    $t0, 0($t1)
sw    $t0, 0($t2)

which occurs frequently in memory-memory copy operations (think stringcopy). In these situations we can perform forwarding between the WriteBack and the Memory stage to outperform our simple machine.

dugan@cs.washington.edu