# Problem Set #4

##### Due Wednesday, April 28, 1999

• Do the following problems, and hand in the answers by the start of class on Wednesday, April 29.
1. Problem 2.2 (note that some printings of the book have the typo "106" instead of "106")
2. Problem 2.3.
3. Problem 2.4.
4. Problem 2.10.
5. Problem 2.12.
6. After you graduate, you are asked to become head computer designer at Fast Computers.   Since you know that procedure calls are expensive operations, you have invented a design that reduces the number of loads and stores required by procedure calls and returns.   You run a series of experiments to contrast performance of your machine, MPlus, with the original machine, MBase, with the same compiler.  You find out that:
• The cycle time of MPlus is 8% higher than that of MBase (because your enhancement slows the critical timing path).
• On MBase, 29% of instructions executed are loads and stores.
• MPlus executes 1/3 fewer loads and stores than MBase.
• All instructions (including loads and stores) take a single cycle.

Is the technical department (which looks at execution time) happy - is MPlus faster or slower than MBase?  Justify quantitatively.

Is the marketing department (which advertises MIPS) happy?  Justify quantitatively.

7. After your initial attempts at a new computer design, you decide to try something different, utilizing compiler optimizations.  Fast Computers is now manufacturing a more contemporary RISC machine, MFast.  MFast runs at a 500MHz clock rate (with a 2ns clock cycle time).  Measurements of various benchmarks on MFast are as follows:
 Instruction Type Frequency Clock Cycles ALU Ops 40% 1 Loads 23% 2 Stores 13% 2 Branches 24% 3

What is the CPI of MFast?  What is its MIPS rating?

Now your compiler optimizations allow you to eliminate half of the loads and half of the stores.  What is the new CPI?  How much better is the performance of MFast with optimized code?  Do the MIPS ratings of MFast before and after optimizations give an adequate measure of improved performance?

8. Read the book on page 101 explaining Amdahl's Law.  The answer the following question:

Implementations of floating point square root (FPSQR) vary widely.  A customer from Fast Computers has a very important benchmark which spends 15% of its execution time on FPSQR instructions.  Fast Computers' other design technician proposes a hardware extension to speed FPSQR up by a factor of 10.  You, with your experience in computer design, know that floating point operations take up 50% of the benchmark's execution time, so you propose extensions which will speed up all floating point operations so they run 1.8 times faster.

What is the speedup for the FPQSR enhancement?

What is the speedup for your enhancement?

Which is better?

9. Problem 5.6.
10. Problem 5.12.