Lab 3 is simple to describe but difficult to do. It is to pipeline your Lab 2 design. There needs to be at least 5 pipeline stages:
Fetch: read the instruction from code memory
Decode/Register access: decode the instruction and read the register file
Execute: execute the instruction
Memory write: read or write from memory
Write back: Write the results back to the register file
Those taking the hardware option may not start from the lab 2 solution. It is acceptable (encouraged) to study the lab 2 solution and you may draw on portions of it for your lab 3 work, but your lab 3 solution must be substantially your own and based on your lab 2 work. What does "substantially your own" mean? It is ambigious but to put numbers on it, I would say it is OK to use any snippets of code from the lab 2 solution, where snippets are defined as 10-20 lines; and you can use a half dozen to a dozen snippets. Take this definition loosly, however. The main requirement is that your lab 3 solution be primarily your own work.
When is the due date? March 4th. This is the last lab.
THIS IS A VERY DIFFICULT LAB Please start early. Do not wait for the last week. Personally, I think this lab is 50% more difficult than Lab 2. It looks easy, but there are a ton of moving parts and getting it correct takes time.
Do I need to use my lab 2 solution or can I start from the provided lab 2 solution? Those choosing the hardware option may NOT start from the lab 2 solution. Those choosing the software/exam option are free to do so.
Do I need to handle data hazards via forwarding? Absolutely. That's what this lab is all about.
What do I do with the instructions that are fetched after a taken branch? You need to implement the ARM32 ISA correctly. In this ISA that means you need to squash those instructions (make them have no effect on on architecturally visible state).
Do my memories need to be clocked for read? Yes.
Will I need to stall the pipeline now and then? Yes.
How big will the final design be? The solution is about 3800 LUTs including the USB interface. This is about 1000 LUTs more than the non-pipelined version.