#### Static vs. dynamic scheduling

- Assumptions (for now):
  - 1 instruction issue / cvcle
  - Several pipelines with a common IF and ID
    - · Ideal CPI still 1, but real CPI won't be 1 but will be closer to 1 than before
  - · Same techniques will be used when we look at multiple issue
- Static scheduling (optimized by compiler)
  - When there is a stall (hazard) no further issue of instructions - Of course, the stall has to be enforced by the hardware
- Dynamic scheduling (enforced by hardware)
  - Instructions following the one that stalls can issue if they do not produce structural hazards or dependencies

Dyn. Sched. CSE 471 Autumn 02

1

3

5

### Dynamic scheduling

- Implies possibility of:
  - Out of order issue (we say that an instruction is issued once it has passed the ID stage) and hence out of order execution
  - Out of order completion (also possible in static scheduling but less frequent)
  - Imprecise exceptions (will take care of it later)
- · Example (different pipes for add/sub and divide)
  - R1 = R2/R3R2 = R1 + R5(long latency) (stall, no issue, because of RAW on R1) R6 = R7 - R8(can be issued, executed and completed before the other 2)

Dyn. Sched. CSE 471 Autumn 02

#### **Issue and Dispatch**

- Split the ID stage into:
  - Issue : decode instructions; check for structural hazards and maybe more hazards such as WAW depending on implementations. Stall if there are any. Instructions pass in this stage in order
  - *Dispatch*: wait until no data hazards then read operands. At the next cycle a functional unit, i.e. EX of a pipe, can start executing
- Example revisited.

| R1 = R2/R3 | (long latency; in execution) |
|------------|------------------------------|
|------------|------------------------------|

R2 = R1 + R5(issue but no dispatch because of RAW on R1) R6 = R7 - R8(can be issued, dispatched, executed and

completed before the other 2)

Dyn. Sched. CSE 471 Autumn 02

## Implementations of dynamic scheduling

- In order to compute correct results, need to keep track of :
  - execution unit (free or busy)
  - register usage for read and write
  - completion etc.
- Two major techniques
  - Scoreboard (invented by Seymour Cray for the CDC 6600 in 1964)
  - Tomasulo's algorithm (used in the IBM 360/91 in 1967)

Dyn. Sched. CSE 471 Autumn 02

# Scoreboarding -- The example machine

(cf. Figure A-70 in your book)



#### Scoreboard basic idea

- · The scoreboard keeps a record of all data dependencies - Keeps track of which registers are used as sources and destinations and which functional units use them
- The scoreboard keeps a record of all pipe occupancies - The original CDC 6600 was not pipelined but conceptually the scoreboard does not depend on pipelining
- The scoreboard decides if an instruction can be issued - Either the first time it sees it (no hazard) or, if not, at every cycle thereafter
- The scoreboard decides if an instruction can store its result - This is to prevent WAR hazards

Dvn. Sched. CSE 471 Autumn 02

6

2

4

#### An instruction goes through 5 steps

- We assume that the instruction has been successfully fetched (no I-cache miss)
- 1. Issue
  - The execution unit for that instruction type must be *free* (no structural hazard)
  - There should be **no** WAW hazard
  - If either of these conditions is false the instruction stalls. No further issue is allowed
    - There can be more fetches if there is an instruction fetch buffer (like there was in the CDC 6660)

Dyn. Sched. CSE 471 Autumn 02

# Execution steps under scoreboard control (c'ed)

- 4. Write result
  - Before writing, check for WAR hazards. If one exists, the unit is stalled until all WAR hazards are cleared (note that an instruction in progress, i.e., whose operands have been read, won't cause a WAR)
- 5. Delay (you can forget about this one)
  - Because forwarding is not implemented, there should be one unit of delay between writing and reading the same register (this restriction seems artificial to me and is "historical").
  - Similarly, it takes one unit of time between the release of a unit and its possible next occupancy

Dyn. Sched. CSE 471 Autumn 02

# What is needed in the scoreboard (slightly redundant info)

- Status of each functional unit
  - Free or busy
  - Operation to be performed
  - The names of the result *Fi* and source *Fj*, *Fk* registers
  - Flags Rj, Rk indicating whether the source registers are ready
  - Names Qj,Qk of the units (if any) producing values for Fj, Fk
- Status of result registers
  - For each *Fi* the name of the unit (if any), say *Pi* that will produce its contents (redundant but easy to check)
- The instruction status
  - Been issued, dispatched, in execution, ready to write, finished?

Dyn. Sched. CSE 471 Autumn 02

0

#### Execution steps under scoreboard control

- 2. Dispatch (Read operands)
  - When the instruction is issued, the execution unit is reserved (becomes *busy*)
  - Operands are read in the execution unit when they are both ready (i.e., are not results of still executing instructions). This prevents RAW hazards (this conservative approach was taken because the CDC 6600 was not pipelined)
- 3. Execution
  - One or more cycles depending on functional unit latency
  - When execution completes, the unit notifies the scoreboard it's ready to write the result

Dyn. Sched. CSE 471 Autumn 02

### **Optimizations and Simplifications**

- There are opportunities for optimization such as:
  - Forwarding
  - Buffering for one copy of source operands in execution units (this allows reading of operands one at a time and minimizing the WAR hazards)
- We have assumed that there could be concurrent updates to (different) registers.
  - Can be solved (dynamically) by grouping execution units together and preventing concurrent writes in the same group or by having multiple write ports in the register file (expensive but common nowadays)

Dyn. Sched. CSE 471 Autumn 02

10

8

#### Condition checking and scoreboard setting Issue step Issue step - Unit free, say Ua and no - Ua busy and record Fi, Fj, Fk WAW - Record Qj, Qk and Rj, Rk · Dispatch (Read operand )step - Record Pi = Ua- Rj and Rk must be yes (results Dispatch (Read operand) step ready) Execution step Execution step - At end ask for writing permission (no WAR) Write result Write result Check if Pi is an Fj, Fk(Rj, - For subsequent instrs, if Rk = no) in preceding instrs. If Qj(Qk) = Ua, set Rj(Rk) to yesves stall. - Ua free and Pi = 0

Dyn. Sched. CSE 471 Autumn 02

|                                                                         | Instruction Issue Dispatch Executed Result written                                                 |  |
|-------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|--|
| Example                                                                 | Load F6, 34(r2) yes yes yes yes                                                                    |  |
| •                                                                       | Load F2, 45(r3) yes yes yes                                                                        |  |
| Load F6, 34(r2) Load f-p register F6                                    | Mul F0, F2, F4 yes                                                                                 |  |
|                                                                         | Sub F8, F6, F2 yes                                                                                 |  |
| Load F2, 45(r3) Load latency 1 cycle                                    | Div F10, F0, F6 yes                                                                                |  |
| MulF F0, F4 Mult latency 10 cycles                                      | Add F6,F8,F2 Functional Unit status                                                                |  |
| Sub F8, F6, F2 Add/sub latency 2 cycles                                 | No Name Busy Fi Fj Fk Qj Qk Rj Rk                                                                  |  |
| DivF F10, F6 Divide latency 40 cycles                                   | 1 Int yes F2 r3                                                                                    |  |
| Add $F6,F8,F2 \longrightarrow RAW$                                      | 2 Mul yes F0 F2 F4 1 No Y<br>3 Mul no                                                              |  |
| ▶ WAR                                                                   | 4 Add yes F8 F6 F2 1 Y No                                                                          |  |
| Assume that the 2 Loads have been issued, the first one completed, the  | 5 Div yes F10 F0 F6 2 No Y                                                                         |  |
| second ready to write. The next 3 instructions have been issued (but    | Register result status                                                                             |  |
| not dispatched).                                                        | F0 (2) F2 (1) F4 ( ) F6( ) F8 (4) F10 (5) F12                                                      |  |
| Dyn. Sched. CSE 471 Autumn 02 13                                        | Dyn. Sched. CSE 471 Autumn 02 14                                                                   |  |
|                                                                         |                                                                                                    |  |
| Instruction Issue Dispatch Executed Result written                      | Instruction Issue Dispatch Executed Result written                                                 |  |
| Load F6, 34(r2) yes yes yes yes                                         | Load F6, 34(r2) yes yes yes yes                                                                    |  |
| Load F2, 45(r3) yes yes yes yes                                         | Load F2, 45(r3) yes yes yes yes                                                                    |  |
| Mul F0, F2, F4 yes yes 1 cycle after 2nd load has                       | Mul F0, F2, F4 yes yes in progress                                                                 |  |
| Sub F8, F6, F2 yes yes written its result<br>Div F10, F0, F6 yes        | Sub F8, F6, F2yesyesyesDiv F10, F0, F6yes                                                          |  |
| Div F10, F0, F6 yes<br>Add F6,F8,F2                                     |                                                                                                    |  |
| Functional Unit status                                                  | Functional Unit status 6 cycles later; Mul in                                                      |  |
| No Name Busy Fi Fj Fk Qj Qk Rj Rk                                       | No Name Busy Fi Fj Fk Qj Qk Rj Rk execution; Sub has<br>completed;Div issues; Add                  |  |
| 1 Int no                                                                | 1 Int no waits for writing                                                                         |  |
| 2 Mul yes F0 F2 F4 Y Y<br>3 Mul no                                      | 2 Mul yes F0 F2 F4 Y Y<br>3 Mul no                                                                 |  |
| 4 Add yes F8 F6 F2 Y Y                                                  | 4 Add yes F6 F8 F2 Y Y                                                                             |  |
| 5 Div yes F10 F0 F6 2 No Y                                              | 5 Div yes F10 F0 F6 2 No Y                                                                         |  |
| Register result status<br>F0 (2) F2 ( ) F4 ( ) F6( ) F8 (4) F10 (5) F12 | Register result status       F0 (2)     F2 ( )     F4 ( )     F6(4)     F8 ( )     F10 (5)     F12 |  |
| 10(2) $12()$ $14()$ $10()$ $10(4)$ $110(3)$ $112$                       | 10(2) $12()$ $14()$ $10(+)$ $10()$ $110(0)$ $112$                                                  |  |
| Dyn. Sched. CSE 471 Autumn 02 15                                        | Dyn. Sched. CSE 471 Autumn 02 16                                                                   |  |
|                                                                         |                                                                                                    |  |
| Instruction Issue Dispatch Executed Result written                      | Instruction Issue Dispatch Executed Result written                                                 |  |
| Load F6, 34(r2) yes yes yes yes                                         | Load F6, 34(r2) yes yes yes yes                                                                    |  |
| Load F2, 45(r3) yes yes yes yes                                         | Load F2, 45(r3) yes yes yes yes                                                                    |  |
| Mul F0, F2, F4 yes yes yes yes yes                                      | Mul F0, F2, F4 yes yes yes yes                                                                     |  |
| Sub F8, F6, F2yesyesyesDiv F10, F0, F6yesyes                            | Sub F8, F6, F2yesyesyesDiv F10, F0, F6yesyes                                                       |  |
| Add F6.F8.F2 ves ves                                                    | Add F6.F8.F2 ves ves ves ves                                                                       |  |
| Functional Unit status 4 cycles later (I think!)                        | Functional Unit status                                                                             |  |
| dispatch; Add will write at                                             | No Name Busy Fi Fj Fk Qj Qk Rj Rk 1 cycle later. Only Div is<br>not finished                       |  |
| 1 Int no next cycle<br>2 Mul no                                         |                                                                                                    |  |
| 3 Mul no                                                                | 3 Mul no                                                                                           |  |
| 4 Add yes F6 F8 F2 Y Y                                                  | 4 Add no                                                                                           |  |
| 5 Div yes F10 F0 F6 Y Y<br>Productor recent status                      | 5 Div yes F10 F0 F6 Y Y                                                                            |  |
| Register result status<br>F0 () F2 () F4 () F6(4) F8 () F10 (5) F12     | Register result status<br>F0() $F2()$ $F4()$ $F6()$ $F8()$ $F10(5)$ $F12$                          |  |
|                                                                         |                                                                                                    |  |
| Dyn. Sched. CSE 471 Autumn 02 17                                        | Dyn. Sched. CSE 471 Autumn 02 18                                                                   |  |
|                                                                         |                                                                                                    |  |
|                                                                         |                                                                                                    |  |