Steam-powered Turing Machine University of Washington Computer Science & Engineering
 CSE 378 Fall 2006
  CSE Home     CSE 378 Fall 2006  About Us    Search    Contact Info 

 Home
Administrative
 Syllabus
 Office Hours
 Mailing List
Assignments
 Reading
 Homework
 Labs
 Exams
Resources
 Lecture Slides
 Handouts
 Wiki
 MIPS Resources
 AHDL Resources
Anonymous Feedback
 Feedback Form
   

Lab 3: Pipelining (Wikified Version)

Assigned: 10/30/2006
Due: TBD

Description

The processor constructed in the previous labs implements a significant part of the MIPS instruction set, but does so very slowly. The problem is that the clock speed is constrained by the longest path a signal can take in a single clock cycle. For the first two labs this path started at the register file, went through the ALU, the memory, and finally back into the register file.  In this lab, we'll be shrinking the longest path significantly by breaking the datapath into 5 separate pieces or stages. Adding registers between stages will mean that the longest path a signal must traverse in a single cycle will decrease dramatically. This speed-up will come at a cost, because it will now take 5 cycles for an instruction to complete. To offset this slowdown we'll be able to run independent instructions in each stage, for a maximum of 5 instructions "in-flight", leading to a maximum throughput of 1 instruction per cycle. 

Background: (COD: 3e pp 370-374, 384-402)

Phase 0: Administration

This lab will take place in the same workspace as the previous labs. The files for this lab are provided as an archived design. You will need to restore the design, add it to the workspace, and copy several design files from lab2 into the new design. Download the archived design lab3.zip and follow these steps:

  1. Download lab3.zip.
  2. Start Active-HDL and Open the cse378 workspace used in previous labs.
  3. Select Design > Restore Design from the menu
  4. Browse to the downloaded lab3.zip file
  5. Set the Restore To: directory to the cse378 directory that contains the folders lab1, lab2 and lib378 and click Finish.

At this point the files are all available. Now we need to tell Active-HDL about the new design.

  1. Right-Click the yellow workspace icon name cse378 from the design browser and choose "Add Existing Design to Workspace".
  2. Find and open the file "lab3.adf" in the newly restored lab2 directory.
  3. Set lab3 as the active design
  4. Click and Drag pcaddresscomputer.v, controller.v and cpu.bde from lab2 to lab3.

Phase 1: Partitioning and Pipeline Registers

The fundamental idea behind pipelining is to separate the datapath into individual stages.  Each stage will take one clock cycle to complete, and can contain one instruction. This phase will describe the elements of the stages and the pipeline registers that function as the barrier between stages.   A pipeline register saves the values from the previous stage so that each stage can perform an independent instruction. In addition, the pipeline registers must have a reset signal so that their initial state is known, and a load enable signal. The load enable will be used later to ensure that certain registers do not update their values in special cases. 

WARNING: The test fixtures are very name-sensitive. Double-check all wire and component names.

Part A. Organizing Stage Logic

The following stage descriptions explain the components that should be in each stage. Please scan the descriptions before making any changes, and be sure to refer back as things progress.

Instruction Fetch (IF) Stage

The first stage is responsible for getting the current instruction from the instruction memory at the address specified by the program counter (PC) .  The memory access is a slow process so the resulting instruction goes straight into the pipeline register. In parallel with the fetch, this stage must also compute the next PC address (PC+4).  This stage stores the instruction into the IF/ID pipeline register, along with the PC+4 value for use in JAL and JALR instructions.

  1. Create a wire from the Inst input and name it IF_Inst. Use this when you need to reference the instruction in the IF stage.
  2. Create a VCC component and name it LoadEnable
    1. Hit ESC to set cursor to the arrow
    2. Hit (P) to change cursor to VCC
  3. Change PC to use LoadEnable
    1. Right-Click the Program Counter and select "Replace Symbol"
    2. If lib378 appears then skip the next step
    3. Otherwise, Right-click the middle window and choose "Select Libraries", Expand the "User Libraries" and choose lib378
    4. Choose register_re and hit OK
  4. Add a wire LoadEnable to the LD pin of the Program Counter
  5. Create an IFIDReg pipeline register and name it IFIDReg
    1. Use the Symbols Toolbox to add an IFIDReg to cpu.bde
    2. Select the new IFIDReg component, and hit ALT+Enterto open the properties dialog
    3. Set the name to IFIDReg ESSENTIAL!!
  6. Connect PC+4 to the input IF_NextPC of IFIDReg
  7. Connect Inst to IF_Inst of IFIDReg
  8. Use the "Add Stubs" menu option to complete signal generation for IFIDReg

Instruction Decode (ID) Stage 

This stage is responsible for the majority of the work. It reads an instruction from the IF/ID pipeline register, decodes the instruction, generates control signals, reads values from the register file and performs comparisons for branches (more on that later). This stage stores control signals for all later stages, data values for A and B inputs to the ALU, the result from the extender unit, the PC+4 value, the destination register index (used in lab 4), and control information needed by the ALU into the ID/EX pipeline register. 

Execution Stage (EX) Stage

The ALU is the heart of the processor so it gets a stage all of its own. This stage contains the ALU and  multiplexers that determine which values are used as the input to the ALU. It stores the output from the ALU, the data value from the RT field of the instruction, and the destination register to the EX/MEM pipeline register along with necessary control signals.

Memory Access (MEM) Stage

This provides an access to memory for a load or store instruction. The EX/MEM register provides the address and data for the memory as well as control signals. The output from the memory is stored in the MEM/WB register along with the ALU output from the EX/MEM register.

Write-back (WB) Stage

This stage is included to separate memory accesses from the register file. In this stage, the write value for the register value is determined, and the register file is potentially updated based on control signals. 

Part B: Instruction Decode (ID) Stage

This part provides a strategy for organizing the ID stage and completing the pipeline register that ends the stage. The file piperegisters.v contains an incomplete IDEXReg module.  However, before completing the register there are others issues to address.

  1. Replace the register file with a registerfile2 component. (See Above for more)
  2. Make space between the Regfile and any multiplexers or wires ( Hint: Select wires, hit Shift and drag to separate )
  3. Update buses that include the name Inst to use the name ID_Inst
  4. Disconnect the destination register logic from the register file and name the output bus ID_WriteReg(4:0)
  5. Compose a list of control signals used in the EX, MEM, and WB stages

 At this point its time to address the IDEXReg component defined in piperegisters.v. Your task is to define the input and output ports for the additional data needed in the EX stage and the control signals needed in the EX, MEM, and WB stages. As a starting point, ID_RegWrite is an input control signal and EX_RegWrite is the output of the same signal. 

  1. Complete the definition of IDEXReg and compile piperegisters.v.
  2. Return to cpu.bde and use the Symbols Toolbox to add an instance of IDEXReg
  3. Set the name to IDEXReg ESSENTIAL!!
  4. Add buses for data and control inputs to IDEXReg
  5. Use "Add Stubs" to generate buses for the rest of the ports.

Part C: Execute (EX) Stage

This part provides a strategy for organizing the EX stage and completing the pipeline register that ends the stage. The file piperegisters.v contains an incomplete EXMEMReg module.  However, before completing the register there are others issues to address.

  1. Change the ALUControl inputs to use EX_Inst
  2. Rename Control Signals to use the versions from the IDEXReg
  3. Compose a list of control signals used in the MEM, and WB stages

Your task is to define the input and output ports for the control signals that get passed through the EX stage for use in the MEM and WB stages.

  1. Complete the definition of EXMEMReg and compile piperegisters.v.
  2. Return to cpu.bde and use the Symbols Toolbox to add an instance of EXMEMReg
  3. Set the name to EXMEMReg ESSENTIAL!!
  4. Add buses for data and control inputs to EXMEMReg
  5. Use "Add Stubs" to generate buses for the rest of the ports.

Part D: Memory (MEM) Stage

The memory stage provides the address and data for writing to memory or reading from memory. It sends the value read from memory and the address (ALUOut) on to the WB stage.  The file piperegisters.v contains an incomplete MEMWBReg module.  First there are some bookkeeping tasks.

  1. Update the output ports that interact with memory to use the values from the EXMEMReg.
  2. Rename to ports that take inputs from Memory to reflect that they occur in this stage.

Once these things are complete its time to finalize the MEMWBReg from piperegisters.v. You need to add ports and logic for the control signals that get passed through the EX stage for use in the WB stage.

  1. Complete the definition of MEMWBReg and compile piperegisters.v.
  2. Return to cpu.bde and use the Symbols Toolbox to add an instance of MEMWBReg
  3. Set the name to MEMWBReg ESSENTIAL!!
  4. Add buses for data and control inputs to MEMWBReg
  5. Use "Add Stubs" to generate buses for the rest of the ports.

Part E: Write-Back (WB) Stage

This stage basically just separates the memory access time from the update to the register file. At this point, all the pipeline registers are in place, and the last task is to rename some signals to ensure that the proper data is being used.

  1. Change the Regfile control signals to reflect their source in the WB stage.
  2. Correct the sources of the mux that determines the WriteData input to the Regfile

Once these things are complete its time to finalize the MEMWBReg from piperegisters.v. You need to add ports and logic for the control signals that get passed through the EX stage for use in the WB stage.

  1. Complete the definition of MEMWBReg and compile piperegisters.v.
  2. Return to cpu.bde and use the Symbols Toolbox to add an instance of MEMWBReg
  3. Set the name to MEMWBReg ESSENTIAL!!
  4. Add buses for data and control inputs to MEMWBReg
  5. Use "Add Stubs" to generate buses for the rest of the ports.

Part F: Testing

The cpu.bde has a fairly limited number of input and output ports. To make the test fixtures more effective there is a file cpu_wrapper.v that "peeks" inside the CPU and exposes a wide range of signals for the test fixtures.

Test the updated cpu.bde with test fixture phase1_tf.v.  This test fixture runs through the all of the non-control instructions, and verifies that everything is connected properly. Problems with branch or jump instructions will be addressed in the test fixture for Phase 2.

Phase 2: Branching and Delay Slots

The processor from Lab 2 decided the next PC value in every cycle based on the current instruction. Each instruction took a single cycle, so branch instructions would know the resulting PC before the next clock cycle. After pipelining, the control signals are not available until the cycle after an instruction is fetched. This causes a control hazard for branch instructions because we do not know if the branch occurs until the following instruction has been fetched. 

There are different ways to deal with this problem, and the MIPS designers decided to turn the hazard into a feature by defining a delay slot.  The delay slot is the instruction directly after a branch or jump, and is always executed, regardless of branch outcome. This means that no instructions are squashed on branches, because the branch result is known for the instruction after the delay slot. (See COD:3e pp 423-424 for more on delay slots)

Part A. Branch Comparisons

In the preceding labs, branch comparisons were performed by forcing a subtract in the ALU.  For single-cycle machines this was an efficient reuse of logic. In a pipelined machine we want to make the branch decision in the ID stage, so we need the result of the comparison in the ID stage.

  • Add a comparator component to the ID stage of your pipeline
  • Compare the ID_RS_OUT and ID_RT_OUT buses
  • Name the output signal ID_EQ
  • Replace uses of Zilch ( from ALU ) with ID_EQ

Part B. PC Address Computation

In labs 1 and 2 the branch address was computed by adding the offset to the address of the next instruction. The simplest way to do this computation was to add the branch offset to PC+4. That would ensure that the proper instruction was targeted.  Now, the branch decision is delayed one cycle, meaning that PC+4 is actually two instructions after the branch. There are  a number of ways to compute the correct branch target address. The deciding factor is the complexity of the logic required. 

The minimal solution is to pass both PC and PC+4 (IF_NextPC) to the pcaddresscomputer. Branch and jump address computations use PC, while PC+4 is used as the default, or for branches that aren't taken.

  1. Open pcaddresscomputer.v and add new input port NextPC
  2. Change the logic to use NextPC in non-branch / non-jump cases
  3. Save and compile pcaddresscomputer.v
  4. Open cpu.bde, right-click pcaddresscomputer and select "Compare symbol with contents"
  5. In the new window check the button "Update Symbol inside Block Diagram" and hit OK.
  6. right-click the pcaddresscomputer and use "Edit" or "Edit Symbol in Other Window" to fix the pins
  7. Make the proper wiring changes

Part C.  Delay Slots and Return Addresses

    The JAL and JALR instructions store the address of the "next" instruction to the register file.  In Lab 2 the "next" instruction was the immediately following instruction, meaning that PC+4 was the stored return address. 

    The addition of the delay slot changes this behavior, because the instruction directly after the JAL or JALR is in the delay slot. Logically, it takes place before the jump into the subroutine. So, the "next" instructions is the instruction after the delay slot.  This means that these instructions should store PC+8. Rather than add a separate +8 adder take advantage of the following trivial equivalence:

        PC + 8 = (PC + 4) + 4 = NextPC + 4    

  • Compute NextPC+4 in EX stage. The computation should be performed in the ALU; do not add a separate adder.

Part C.  Testing

Test the updated cpu.bde with test fixture phase2_tf.v.  This test fixture is designed to exercise the branch and jump instructions to ensure that the correct comparisons were made and that the return addresses are being computed properly.

Turnin

To turn in your lab, complete all phases and use the Design -> Archive Design command in ActiveHDL on the final product to produce a .zip file containing all the essential files. Put this somewhere accessible via attu and use "turnin -c cse378 'your file'" to submit it for grading.
 


CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to Course Staff]