CSE 378 - Winter 2002

Machine Organization and Assembly Language Programming

Useful Links
SMOK home Component Reference Turnin Form

Homework 6: Pipelined Processor Implementation
Due Monday, March 4th

Purpose:

The purpose of this homework is to learn how pipelines are implemented. To implement your pipeline, you will be creating stages and saving stage state, and implementing pipeline forwarding and interlocks.

This assignment is to be done in pairs. You must work with a different person than you did on Homework #5.

Tasks:

Task 1: Your task is to take the single-cycle implementation from Homework #5 and pipeline it. Your pipeline will be a conventional 5-stage MIPS pipeline. If you are confident of your solution to the last homework, you may use that as a starting point for this assignment, since you're already familiar with it. However, if you are uncertain of your implementation, we don't want you to be at a disadvantage, so you may request a sample solution to start from. If you want this option (or you just want to see how Harlan did it), email harlan with the request.
As was the case in the last assignment, the ultimate resource for your questions is the R2000 processor (except for changes listed in this handout and HW5).

Task 2: Write a (main-program) function that counts the number of 1's in the binary representation of a word, and run this program on your machine.

The Details

Pipeline:
The pipeline is:
Fetch -> Register Read -> Execute -> Memory Access -> Register Writeback
See Figure 6.40 in your book for a detailed explanation of the pipeline stages.

Data hazards:
You should implement data hazard detection and forwarding logic to resolve data hazards. For load delays, if a consumer instruction of a loaded value follows the load instruction, your pipeline will have to stall until the load has returned from memory (see figure 6.46). For arithmetic operations, forwarding logic should be sufficient to resolve hazards.
NOTE: We have been talking about hardware that writes to the register file in the first half of the clock cycle, and reads from it in the second half. Unfortunately, SMOK does not work this way. For this reason, you will need to have additional forwarding that goes from the WB stage to the ID stage. Or you may find some other solution to this problem.

Branch hazards:
You will need to handle branch hazards. The basic problem is that your pipeline doesn't know it has fetched a branch instruction until the decode stage, doesn't know the result of the branch condition until the end of the EX stage, and doesn't change the PC value until the MEM stage. By then it will already have fetched the following sequential instructions. Design hardware to handle branches and control hazards. This will involve deciding in which stage your branch resolution hardware should be and squashing any wrong-path instructions that are fetched. Depending on where you place your branch resolution hardware, you may or may not have to extend the forwarding logic you built to eliminate data hazards.

Memory layout:
Use the same memory layout as Homework #5.

Instructions:
Support the following instructions:
Arithmetic:
ADDU, ADDIU, SUBU
AND, SRL <--------- these are new for HW6
Load/Store:
LW, SW
Branch:
BEQ, BNE

Startup code:
You can begin with Homework #5's startup code and modify as appropriate, but you shouldn't need much. You can just plunk your code at address 0, and put some kind of termination at the end of it.

The executing program:
Write a new procedure in MIPS assembly and compile it down to the instructions you implemented. This procedure is called countbits(int x). What this procedure does is it counts the number of 1 bits in a word. In C this procedure may look like:
int countbits(int x) {
  int v;
  v =0;
  while( x != 0 ) {
    if ( (x & 1) != 0 )
      v = v +1;
    x = x >> 1;
  }
  return ( v );
}
Please follow register conventions--put arguments and return values in the appropriate places so your code is easy to test.
Performance
Although your primary goal is to design and implement a correctly functioning pipeline and to write a correct test program, try to implement efficient versions of them both. How many cycles does it take to execute your code on your pipeline? How many cycles does your pipeline stall when executing this code? (for both of these questions, you can use the input 0x6139, but feel free to test cases of your own design as well.)

Written Report
Write a short report (one to two pages is sufficient) describing:
Turnin
Turn in your code online, using this form, and turn in your written report in class. Please submit only one set of files per group -- there is no need for you and your partner to both submit code (or written reports) separately. Just make sure to indicate both partners' information on the turnin, so you will both get a grade. Both written and code parts are due by class on 3/4. I suggest you finish the code part early, and leave time to describe it in the written report.

EXTRA CREDIT:

YOU CAN ONLY GET CREDIT FOR THESE IF YOUR BASIC PIPELINE ABOVE WORKS.

Grading:

You will be graded on:
  1. functionality of countbits on your machine (does countbits perform the correct function?)
  2. correctness of your machine (do all the instructions on your machine work correctly?)
  3. design of your machine (is it understandable, are the design decisions you made justifiable?)
  4. your written report

Hints: