





















|                                                                             |                                                                                                  | Performance benefits                                                                                                                                                                                                     |                |
|-----------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|
| <ul> <li>Eac</li> <li>–</li> <li>–</li> <li>–</li> <li>This of b</li> </ul> | h instruction c<br>Arithmetic<br>Load<br>Store<br>Branches<br>s would mean t<br>being limited by | an execute only the stages that are neces<br>that instructions complete as soon as pose<br>y the slowest instruction.                                                                                                    | sible, instead |
|                                                                             | 1.<br>2.<br>3.<br>4.<br>5.                                                                       | Proposed execution stages<br>Instruction fetch and PC increment<br>Reading sources from the register file<br>Performing an ALU computation<br>Reading or writing (data) memory<br>Storing data back to the register file |                |



|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | Cost benefits                                                                                                                                                                                                                                                              |  |  |  |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| <ul> <li>As an added bonus, we can eliminate some of the extra hardware from the single-cycle datapath.</li> <li>We will restrict ourselves to using each functional unit once per cycle, just like before.</li> <li>But since instructions require multiple cycles, we could reuse some units in a <i>different</i> cycle during the execution of a single instruction.</li> <li>For example, we could use the same ALU: <ul> <li>to increment the PC (first clock cycle), and</li> <li>for arithmetic operations (third clock cycle).</li> </ul> </li> </ul> |                                                                                                                                                                                                                                                                            |  |  |  |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | <ol> <li>Proposed execution stages</li> <li>Instruction fetch and PC increment</li> <li>Reading sources from the register file</li> <li>Performing an ALU computation</li> <li>Reading or writing (data) memory</li> <li>Storing data back to the register file</li> </ol> |  |  |  |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 15                                                                                                                                                                                                                                                                         |  |  |  |





## Our new adder setup

- We can eliminate *both* extra adders in a multicycle datapath, and instead use just one ALU, with multiplexers to select the proper inputs.
- A 2-to-1 mux ALUSrcA sets the first ALU input to be the PC or a register.
- A 4-to-1 mux ALUSrcB selects the second ALU input from among:
  - the register file (for arithmetic operations),
  - a constant 4 (to increment the PC),
  - a sign-extended constant (for effective addresses), and
  - $-\ {\rm a}\ {\rm sign-extended}\ {\rm and}\ {\rm shifted}\ {\rm constant}$  (for branch targets).
- This permits a single ALU to perform all of the necessary functions.
  - Arithmetic operations on two register operands.
  - Incrementing the PC.
  - Computing effective addresses for lw and sw.
  - Adding a sign-extended, shifted offset to (PC + 4) for branches.













## Register write control signals

- We have to add a few more control signals to the datapath.
- Since instructions now take a variable number of cycles to execute, we cannot update the PC on each cycle.
  - Instead, a PCWrite signal controls the loading of the PC.
  - The instruction register also has a write signal, IRWrite. We need to keep the instruction word for the duration of its execution, and must explicitly re-load the instruction register when needed.
- The other intermediate registers, MDR, A, B and ALUOut, will store data for only one clock cycle at most, and do not need write control signals.

24

