#### CDC 6600









```
TIME 09152103 TH
                     99150141
 EPSIB PRORT FIN
                     991 501 50
 ROLBEK FIN
  FS KEYIN - TAPES DOES NOT EXIST: INPUT IGNORED
DES TRPES
          7 FILES 675 PAGES
ISQ PRBIG
  PRB1G:
          UMUSLA SPERRY*TEMPER 86 EPGS D
 SQ PREIG "
          EXEC 8 SYSS*TEMPER 94 EPGS D
           DIEC 8 SYSAGE ONE I TEPRINTS 7 EPGS J
   PREIG:
   PREIGI EXEC 8 SYSS*GP082U001 158 EPGS N
   PREIG
           BT104 SECURE BT100UL03 5 EPGS N
    PREIG: EXEC 8 SYSAGAPGP082U001 310 EPGS N
            EXEC 8 SYSS*8T100UL03 15 EPGS N
                   09:51:53 OPEN/NAXOPN-8/10
                                             UNUCIS/+307
   PT B
                         UNUTIN/+6
                                             EP6011/+26
     ACTIVE RUNS:
                         UNCLZR/+125
      UNUNCE/+21
      EP603/+30
                          XA710A/+26
      EP#F9/+26
    150 PR7777 *
      EP122/800092 START 09:51:58
      #EZ8X/808850 START 09:52:03
```



















#### Why bother with executing out of order?

- OOO:To execute instructions in a different order than the compiler (or person) has specified
- If much work is contending for a limited resource, it would be nice to make progress on other work that did not used that resource
  - e.g. bottlenecked on memory
- Instruction Level Parallelism, makes you go faster!
- Memory today has variable latency
- Code performance migratability

#### Mem->Issue

- WaitFor:
  - Instruction bits
  - Space to issue the instruction to
- Do
  - Issue it!

## Issue->Dispatch

- WaitFor:
  - The functional unit must be free
  - The output register must be free
- Do:
  - Move the instruction to the functional unit

### Dispatch->Execute

- WaitFor:
  - have the instruction has to be there
  - input registers have to be valid
- Do:
  - read the registers and execute!

### Execute -> Complete

- WaitFor:
  - execution to complete
- Do:
  - post the write to the register file

## Complete

- WaitFor
- Do

#### troubles

- DIV rI, r2 -> R3
   ADD r4,r5 -> rI
   STORE rI, @(r6)
- Four solutions:
  - Solution: don't do parallel stuff
  - Completion unit
  - Very carefully dispatch instructions so later instructions only get dispatched if they have a latency that is long enough not to modify architectural state before earlier instructions.
  - Accept it.

#### What are the "I/O" "processors"?

- Accelerators for I/O
  - Provided flexibility to I/O
    - DMA read/write
  - Asynchronous to the main CPU
  - Shared main memory with the CPU
- A little micro controller
  - FGMT, Fine-grained Multithreading



# Why FGMT?

- "Application level" threading
- The logic is precious, the register state, not so much => hence the overhead is small
- A way to deal with long latency operations
- Circuit world variation: C-slow retiming