Compiler optimization: branch delay slots
Similar (in concept) to pipeline scheduling of loads
Put an instruction after the branch that will always be
executed whether the branch is taken or not (compilers can
fill 1 delay slot about 80% of the time).
Lots of variations on cancelling (squashing, nullifying) (see book)
Branch delay has become less important because of multiple issue machines and the inability to fill effectively more than 1 slot and/or deeper pipelines with more than one delay slot.