Penalties increase with deeper pipes and multiple issue machines
The 1 or 2 cycle penalty is “optimistic” because
- Many modern microprocessors have deeper pipes (8 to 20 stages)
- For example, separate decode and register read stages
- Extra decoding stages to see if multiple instructions can be issued
- CISC machines have more complex branch instructions
These simple schemes could yield penalties from 2 up to 12 cycles i.e., from, say, 8 (2 * 4) to 48 (4 * 12) instruction issue slots if several instruction can be issued simultaneously