CSE 378 Homework #6: Pipelined MIPS
Due: 5/24/2002
Assignment Goals & Administrivia
In this assignment, you'll develop a pipelined implementation of a
subset of the MIPS ISA. Your implementation will do forwarding, as
well as a primitive form of branch prediction.
Again, you may (and are encouraged to) work with a partner on this
assignment. You'll make one turnin, and get the same grade. Remember
that your relationship with your partner should be one of equals --
you should both share in the work and understanding required to get
the job done.
Background
The MIPS subset is the same as for HW5. Remember, that MULT/DIV and
SRAV/SLLV are slightly different from the MIPS spec. MULT/DIV behave
like regular R-type instructions, and SRAV/SLLV have reversed their
operands.
The datapath is just the one we developed in class (or used in the
book), plus lots of details relating to forwarding and branches that
you will have to figure out. Remember that we are implementing a
richer MIPS subset than the one described in the book. For this
reason, our machines will necessarily be more complicated.
You ought to be able to use the control developed in the previous
assignment. Your machine should generate the control bits during the
Decode stage, and store them into the ID/EX pipeline register.
Pipeline Registers
Figuring out what these should contain is a critical part of this
assignment. For each stage, write down what information is
calculated, or may be required for downstream stages of the pipeline.
It may be easiest to design a single pipeline register that contains the
union of all the information you might need. Why? Because it will
make debugging easier and the SMOK ControlRegister component (which
you'll use to implement a pipeline register) encourages this style of
design. Obviously, a real machine wouldn't want to waste bits on
unnecessary information.
Data Dependencies: Forwarding
To run efficiently (and correctly) your machine will have to implement
forwarding. In particular, you'll want to forward results from the
Memory stage and the WriteBack stage to the ALU. You'll also want to
supercharge your register file to do forwarding for you.
Data Dependencies: Loads
Recall that loads (followed by dependent instructions) cannot be
helped by forwarding. Why? Because by the time the load instruction
has its result, the dependent instruction has already
completed the Execute stage -- too late to make use of the value
retrieved by the load instruction. In the case of loads, we'll take a
conservative approach, and simply insert a bubble (nop) behind every
load instruction.
Control Dependencies: Branches & Jumps
Recall that branches and jumps need to be treated with care in the
pipeline. If a branch is taken, we need a way to "squash" the
instructions following it (because they've already incorrectly entered
the pipeline). In our implementation, we'll decide whether to take
branches (and change the PC) in the Execute phase. (The book delays
the setting of the PC until the Memory phase.) This means that if a
branch is taken, the two instructions following the branch will need
to be squashed. To squash an instruction, you'll need to zero out the
instructions in the IF and ID stages.
Can we do better? Yes. We can reduce the branch penalty to one
instruction (rather than two) if we move the branch logic to the
decode stage. You can try this for extra credit's sake.
Register File Details
You'll have to enforce the rule that you can't write register zero.
You'll also want to support fowarding around the register
file. Currently, the value returned by a read on a register cycle is
the value that was written in a previous cycle. We can do better,
however. By adding some logic around the register file, we can detect
if one or both read register numbers are the same as the write
register number. If so, we'll serve up the value being written rather than
the value from the register file itself.
Managing Complexity
This machine is almost an order of magnitude more complicated than the
single cycle machine. Keeping the complexity under control is
critical to finishing the assignment. For this reason, we suggest
(require) that you agressively use containers in this assignment.
Here are some suggestions for effective containerization:
- Forwarding unit
- This unit takes value inputs from the EX/MEM
and MEM/WB registers, register number inputs from the ID/EX register,
and outputs signals which are routed to the MUXes guarding the ALU
input. Read the book (section 6.4) to get intimate with the
forwarding problem and solution.
- Branch/Jump logic
- This unit takes inputs like nPC, zero, a
register value (for JR), as well as a the relevant control signals.
Its job is to output a control line that says whether or not to change
the PC, as well as a new PC (this will depend on whether its a branch,
jump, or JR). This unit has a lot of inputs, but its worth building.
- Register file
- As discussed above, this unit should guard
against writes to the zero register, and do forwarding of same-cycle
reads/writes of the same register. By doing this simple forwarding
"in" the register file, your other forwarding unit is simpler.
- Memory unit
- This unit should encapsulate the MemoryInterface and
the CharacterDevice container. Its inputs are just standard
memory inputs: address, value, and control lines. Its output is
just a value.
Solution Path
Below are some hints to get you towards a solution. Your main
strategy should be to understand the issues on paper (and in your
brain) before you start SMOKing.
- (On paper) figure out the contents of your pipeline registers.
- (On paper) figure out the logic of the forwarding unit.
- (On paper) figure out how to handle branches/jumps (flushing the
pipeline) and loads (stalling the pipeline). (Loads are easier than
branches/jumps.)
- Make sure you download the most recent version of SMOK!
- Containerize components from your past solution that you can reuse
in this project. This might feel like a waste of time, but you'll be
happy you did it when you've got hundreds of components in your model.
It will be difficult for us to help you if you haven't done
this! This is a good place to divide work between you and your
partner.
- Lay out the pipeline without forwarding or stalling. This version
of the machine should have just the basic components: register file,
pipeline registers, PC, and ALU. (Don't worry about load/store or
branch/jumps yet.) This pipeline should be able to run programs that
are (heavily) padded with nops:
addi $t0, $0, 10
nop
nop
nop
add $t1, $t0, $t0
- Get forwarding around the register file working (you may have
already done this). If this is working, you can check it with a
program like the above (with one less nop). Make sure this works before
going on.
- Now worry about the forwarding unit. Remember, it has to forward
between the MEM or WB stage to the EX stage. Once this works, you
ought to be able to run straight-line arithmetic code (that doesn't do
any loads).
- Get load-stalling working. Add the memory container, and a tiny
bit of logic that checks if there is a MemRead (this checking is done
during the EX stage). If so, it should disable writes to the PC
(causing the same instruction to be fetched again on the next cycle)
AND "freeze" the current instruction in the IF/ID pipeline register.
Finally, a NOP should be injected into the ID/EX pipeline register,
which can be accomplished by zeroing out the contents of that pipeline
register.
- Get delayed branches working. Put the logic in the EX stage, and
assume a branch delay of 2 cycles. You should be able to test
programs like this:
...
beq $t0, $0, somewhere
nop # branch delay slot 1
nop # delay slot 2
addi $t1, $t0, $t0
...
- After you've got delayed branches/jumps working, think about doing
them correctly -- that is, the two subsequent instructions in the
pipeline should be squashed if the branch is taken. To do this,
you'll need to zero out the contents of the IF/ID and ID/EX pipeline
registers. This has the effect of turning those instructions into
NOPs.
Testing and Running
You ought to be able to use the same test strategies you used in
the last homework. That is, start small and gradually build up to
bigger (or even compiler-generated) programs as your confidence increases.
The Files
You're on your own for building the entire SMOK machine. Check online
for any other support files. We'll probably distribute up-to-date
versions of the prologue, the IO library, and other files.
Extra Credit
Tons of options. For instance, the current implementation doesn't do
well on branches or jumps. In particular, their penalty is as high as
2 cycles. There are lots of ways to do better, none of them are easy,
though. Jumps can be treated differently (at least J and JAL) -- they
are simple enought to handle easily in the Decode stage. This means
we can reduce their penalty to just one cycle. The very keen can do
simple branch prediction.
A simple option is to be less pessimistic in our load strategy.
Currently, we always insert a bubble after the load. We can
do better by only inserting a bubble when the following instruction
depends on the result of the load.
A final option concerns the following sequence of instructions
lw $t0, 0($t1)
sw $t0, 0($t2)
which occurs frequently in memory-memory copy operations (think
stringcopy). In these situations we can perform forwarding between
the WriteBack and the Memory stage to outperform our simple machine.