























WINTER, 2001



**Delayed Branches** 

that they won't have effect until N (where N is the branch delay)

executed *regardless* of the branch outcome. These are called

• This forces the programmer/compiler/assembler to deal with the

• This costs the hardware nothing, since it is the compilers job to

assure that correct instructions (or nops) are scheduled in the

· MIPS branches are delayed (1 slot) and compilers can fill around

· Change the semantics (meaning) of your branch instruction so

• This means that the N instructions after the branch will be

problem, by requiring them to fill the N delay slots.

· Good compilers can usually fill 1 or 2 slots.

WINTER, 2001



cycles later.

delay slots.

delay slots.

70% of the slots.

WINTER, 2001



## How to Handle Exceptions

- We must save the program counter of the offending instruction in the EPC (Exception PC), and then transfer control to the operating system.
- The OS can then take appropriate action (provide an IO service for the program, kill the program, etc). If it chooses to restart the program, it can jump back to the EPC.
- How does the OS know what kind of exception? MIPS includes a *cause* register.
- In hardware, the cause is saved into the cause register, the PC is saved in EPC, and control transfers to a predefined address in the kernel (0x4000 0040).
- Exceptions are hard to deal with because we have several instructions in the pipeline.
- Suppose we get an arithmetic overflow (in the EX stage). We need to be sure to let the downstream instructions finish, while flushing the upstream instructions.

CSE378

WINTER, 2001

195

## The Truth

- The MIPS R2000/3000 pipelined implementation is pretty close to the one we've discussed in class, but modern machines use much more complex implementations:
- Multiple pipelines: superscalar.
- •Trend: exploit instruction level parallelism (ILP) by working on multiple instructions simultaneously. This reduces CPI.
- •Many modern machines issue up to 4 instructions at once.
- •Challenge: statically or dynamically scheduling instructions to extract maximal ILP while keeping cycle time low
- Deep pipelines: superpipelined:
- Trend: Reduce cycle time
- •Modern pipelines often have 8 or more stages.
- Challenge: longer branch and load delays (often leading to higher CPI), more forwarding required, scheduling is also important

Summary

- Pipelining improves performance by increasing throughput (instructions/time) not latency (time/instruction).
- We examined the classic 5 stage pipeline (IF, ID, EX, MEM, WB)
- Data and control hazards place limits on the speedups we can achieve through pipelining.
- •Data hazards can be avoided by stalling or forwarding (unless it is a load!). Stalling can be achieved through software or hardware. Forwarding is more efficient.
- •Branch hazards can only be avoided by hardware stalling or "defining away the problem" via delayed branches.
- •The performance of branches can be improved through delayed branches or branch prediction.
- Compilers must understand the pipeline to extract maximum performance through scheduling. In MIPS, the ISA is no longer a perfect abstraction.

CSE378

WINTER, 2001