Lab 3: Pipelining Ignoring Hazards
Assigned: 2/15/2007
Due: 6:00PM, 2/21/2007
Description
The processor constructed in the previous labs implements a
significant part of the MIPS instruction set, but does so very
slowly. The problem is that the clock speed is constrained by the
longest path a signal can take in a single clock cycle. For the
first two labs this path started at the register file, went through
the ALU, the memory, and finally back into the register file. In
this lab, we'll be shrinking the longest path significantly by
breaking the datapath into 5 separate pieces or stages. Adding
registers between stages will mean that the longest path a signal must
traverse in a single cycle will decrease dramatically.
Additionally, we'll run independent instructions in each stage,
for a maximum of 5 instructions "in-flight",
leading to a maximum throughput of 1 instruction per cycle.
In short, you'll be building the core elements of a pipelined processor,
but leaving for the next homework everything that has to do with hazards.
This first-step pipelined processor will therefore correctly execute only
instruction sequences that do not present any hazards.
Background: (COD: 3e pp 370-374, 384-402)
Phase 0: Administration
This lab will take place in the same workspace as the previous labs. The files for this lab are provided as an archived design. You will
need to restore the design, add it to the workspace, and copy several design files
from lab2 into the new design. Download the archived design lab3.zip and follow
these steps:
- Download lab3.zip.
- Start Active-HDL and Open the cse378 workspace used in previous labs.
- Select Design > Restore Design from the menu
- Browse to the downloaded lab3.zip file
- Set the Restore To: directory to the cse378 directory that contains the
folders lab1, lab2 and lib378 and click Finish.
At this point the files are all available. Now we need to tell Active-HDL
about the new design.
- Right-Click the yellow workspace icon name cse378 from the design browser
and choose "Add Existing Design to Workspace".
- Find and open the file "lab3.adf" in the newly restored lab2
directory.
- Set lab3 as the active design
- Click and Drag pcaddresscomputer.v, controller.v and cpu.bde from
lab2 to lab3.
Phase 1: Partitioning and Pipeline Registers
The fundamental idea behind pipelining is to separate the datapath into
individual stages. Each stage will take one clock cycle to
complete, and can contain one instruction. This phase will describe the elements
of the stages and the pipeline registers that function as the
barrier between stages. A pipeline register saves the values from the previous
stage so that each stage can perform an independent instruction. In addition, the
pipeline registers must have a reset signal so that their initial state is
known, and a load enable signal. The load enable will be used later to
ensure that certain registers do not update their values in special cases.
WARNING: The test fixtures are very name-sensitive.
Double-check all wire and component names.
Part A. Organizing Stage Logic
The following stage descriptions explain the components that should be in
each stage. Please scan the descriptions before making any changes, and be sure
to refer back as things progress.
Instruction Fetch (IF) Stage
The first stage is responsible for getting the current
instruction from the instruction memory at the address specified by the program
counter (PC) . The memory access is a slow process so the resulting
instruction goes straight into the pipeline register. In parallel with the
fetch, this stage must also compute the next PC address
(PC+4). This stage stores the instruction
into the IF/ID pipeline register, along with the PC+4 value for use in JAL and
JALR instructions.
- Create a wire from the Inst input and name it IF_Inst. Use this when you need to reference the instruction in the IF stage.
- Create a VCC component and name it
LoadEnable
- Hit ESC to set cursor to the arrow
- Hit (P) to change cursor to VCC
- Change PC to use
LoadEnable
- Right-Click the Program Counter and select "Replace Symbol"
- If lib378 appears then skip the next step
- Otherwise, Right-click the middle window and choose "Select Libraries", Expand the "User Libraries" and choose lib378
- Choose register_re and hit OK
- Add a wire
LoadEnable to the LD pin of the Program Counter
- Create an IFIDReg pipeline register and name it IFIDReg
- Use the Symbols Toolbox to add an IFIDReg to cpu.bde
- Select the new IFIDReg component, and hit ALT+Enterto open the properties dialog
- Set the name to IFIDReg ESSENTIAL!!
- Connect PC+4 to the input
IF_NextPC of IFIDReg
- Connect
Inst to IF_Inst of IFIDReg
- Use the "Add Stubs" menu option to complete signal generation
for IFIDReg
Instruction Decode (ID) Stage
This stage is responsible for the majority of the work. It reads an
instruction from the IF/ID pipeline register, decodes the
instruction, generates control signals, reads values from the register file and
performs comparisons for branches (more on that later). This stage stores control
signals for all later stages,
data values for A and B inputs to the ALU, the result from the extender unit, the PC+4 value,
the destination register index (used in lab 4), and control
information needed by the ALU into the ID/EX pipeline register.
Execution Stage (EX) Stage
The ALU is the heart of the processor so it gets a stage all of its
own. This stage contains the ALU and multiplexers that determine which values are
used as the input to the ALU. It stores the output from the ALU, the data value
from the RT field of the instruction, and the destination register to the EX/MEM pipeline register along with necessary control
signals.
Memory Access (MEM) Stage
This provides an access to memory for a load or store instruction. The
EX/MEM register provides the address and data for the memory as well as control
signals. The output from the memory is stored in the MEM/WB register along with
the ALU output from the EX/MEM register.
Write-back (WB) Stage
This stage is included to separate memory accesses from the register file. In
this stage, the write value for the register value is determined, and the register file is potentially updated based on control
signals.
Part B: Instruction Decode (ID) Stage
This part provides a strategy for organizing the ID stage and completing the
pipeline register that ends the stage. The file piperegister.v contains
an incomplete IDEXReg module. However, before completing the
register there are others issues to address.
- Replace the register file with a registerfile2 component. (See
Above for more)
- Make space between the Regfile and any multiplexers or wires ( Hint:
Select wires, hit Shift and drag to separate )
- Update buses that include the name
Inst to use the name ID_Inst
- Disconnect the destination register logic from the register file and name the output bus
ID_WriteReg(4:0)
- Compose a list of control signals used in the EX, MEM, and WB stages
At this point its time to address the IDEXReg component defined
in piperegister.v. Your task is to define the input and output ports for
the additional data needed in the EX stage and the control signals needed in the
EX, MEM, and WB stages. As a starting point, ID_RegWrite is an input control
signal and EX_RegWrite is the output of the same signal.
- Complete the definition of IDEXReg and compile piperegister.v.
- Return to cpu.bde and use the Symbols Toolbox to add an instance of
IDEXReg
- Set the name to IDEXReg ESSENTIAL!!
- Add buses for data and control inputs to IDEXReg
- Use "Add Stubs" to generate buses for the rest of the ports.
Part C: Execute (EX) Stage
This part provides a strategy for organizing the EX stage and completing the
pipeline register that ends the stage. The file piperegister.v contains
an incomplete EXMEMReg module. However, before completing the
register there are others issues to address.
- Change the ALUControl inputs to use
EX_Inst
- Rename Control Signals to use the versions from the IDEXReg
- Compose a list of control signals used in the MEM, and WB stages
Your task is to define the input and output ports for the control signals that get passed through
the EX stage for use in the MEM and WB stages.
- Complete the definition of EXMEMReg and compile piperegister.v.
- Return to cpu.bde and use the Symbols Toolbox to add an instance of
EXMEMReg
- Set the name to EXMEMReg ESSENTIAL!!
- Add buses for data and control inputs to EXMEMReg
- Use "Add Stubs" to generate buses for the rest of the ports.
Part D: Memory (MEM) Stage
The memory stage provides the address and data for writing to memory or
reading from memory. It sends the value read from memory and the address (ALUOut)
on to the WB stage. The file piperegister.v contains
an incomplete MEMWBReg module. First there are some bookkeeping
tasks.
- Update the output ports that interact with memory to use the values from
the EXMEMReg.
- Add buses to ports that take inputs from Memory to reflect that they occur in
this stage.
Once these things are complete its time to finalize the MEMWBReg from piperegister.v.
You need to add ports and logic for the control signals that get passed through
the EX stage for use in the WB stage.
- Complete the definition of MEMWBReg and compile piperegister.v.
- Return to cpu.bde and use the Symbols Toolbox to add an instance of
MEMWBReg
- Set the name to MEMWBReg ESSENTIAL!!
- Add buses for data and control inputs to MEMWBReg
- Use "Add Stubs" to generate buses for the rest of the ports.
Part E: Write-Back (WB) Stage
This stage basically just separates the memory access time from the update to
the register file. At this point, all the pipeline registers are in place, and
the last task is to rename some signals to ensure that the proper data is being
used.
- Change the Regfile control signals to reflect their source in the WB
stage.
- Correct the sources of the mux that determines the
WriteData input to the Regfile
Once these things are complete its time to finalize the MEMWBReg from piperegister.v.
You need to add ports and logic for the control signals that get passed through
the EX stage for use in the WB stage.
- Complete the definition of MEMWBReg and compile piperegister.v.
- Return to cpu.bde and use the Symbols Toolbox to add an instance of
MEMWBReg
- Set the name to MEMWBReg ESSENTIAL!!
- Add buses for data and control inputs to MEMWBReg
- Use "Add Stubs" to generate buses for the rest of the ports.
Part F: Testing
The cpu.bde has a fairly limited number of input and output ports. To
make the test fixtures more effective there is a file cpu_wrapper.v that "peeks" inside the CPU and exposes a wide range of signals for the test fixtures.
Test the updated cpu.bde with test fixture phase1_tf.v.
This test fixture runs through the all of the non-control instructions,
and verifies that everything is connected properly. Problems with branch or jump
instructions will be addressed in the test fixture for Phase 2.
Phase 2: Branching and Delay Slots
The processor from Lab 2 decided the next PC value in every cycle based on
the current instruction. Each instruction took a single cycle, so branch
instructions would know the resulting PC before the next clock cycle. After pipelining, the control signals are not available
until the cycle after an instruction is fetched. This causes a control hazard for
branch instructions because we do not know if the branch occurs
until the following instruction has been fetched.
There are different ways to deal with this problem, and the MIPS designers
decided to turn the hazard into a feature by defining a delay slot.
The delay slot is the instruction
directly after a branch or jump, and is always executed, regardless of branch
outcome.This means that no instructions are squashed on
branches, because the branch result is known for the instruction after the delay
slot. (See
COD:3e pp 423-424 for more on delay slots)
Part A. Branch Comparisons
In the preceding labs, branch comparisons were performed by forcing a
subtract in the ALU. For single-cycle machines this was an efficient reuse
of logic. In a pipelined machine we want to make the branch decision in the ID
stage, so we need the result of the comparison in the ID stage.
- Add a comparator component to the ID stage of your pipeline
- Compare the
ID_RS_OUT and ID_RT_OUT buses
- Name the output signal
ID_EQ
- Replace uses of
Zilch ( from ALU ) with ID_EQ
Part B. PC Address Computation
In labs 1 and 2 the branch address was computed by adding the offset to the
address of the next instruction. The simplest way to do this computation
was to add the branch offset to PC+4. That would ensure that the proper
instruction was targeted. Now, the branch decision is delayed one cycle,
meaning that PC+4 is actually two instructions after the branch. There
are a number of ways to compute the correct branch target address. The
deciding factor is the complexity of the logic required.
The minimal solution is to pass both PC and PC+4 (IF_NextPC) to the
pcaddresscomputer. Branch and jump address computations use PC, while PC+4
is used as the default, or for branches that aren't taken.
- Open pcaddresscomputer.v and add new input port
NextPC
- Change the logic to use
NextPC in non-branch / non-jump cases
- Save and compile pcaddresscomputer.v
- Open cpu.bde, right-click pcaddresscomputer and select
"Compare symbol with contents"
- In the new window check the button "Update Symbol inside Block
Diagram" and hit OK.
- right-click the pcaddresscomputer and use "Edit" or
"Edit Symbol in Other Window" to fix the pins
- Make the proper wiring changes
Part C. Delay Slots and Return Addresses
The JAL and JALR instructions store the address of the "next" instruction to the register file. In Lab 2 the "next" instruction was the immediately
following instruction, meaning that PC+4 was the stored return address.
The addition of the delay slot changes this
behavior, because the instruction directly after the JAL or JALR
is in the delay
slot. Logically, it takes place before the jump into the subroutine. So,
the "next" instructions is the instruction after the delay
slot. This means that these instructions should store PC+8. Rather than
add a separate +8 adder take advantage of the following trivial equivalence:
PC + 8 = (PC + 4) + 4 = NextPC + 4
- Compute NextPC+4 in EX stage. The computation should be
performed in the ALU; do not add a separate adder.
Part D. Testing
Test the updated cpu.bde with test fixture phase2_tf.v.
This test fixture is designed to exercise the branch and jump instructions to
ensure that the correct comparisons were made and that the return addresses are
being computed properly.
Turnin
To turn in your lab, complete all phases and use the Design -> Archive Design command in ActiveHDL on the final product to produce a .zip file containing all the essential files. Put this somewhere accessible via attu and use
"turnin -c cse378 'your file'" to submit it for grading.
|