CSEP 548 - Autumn 2012 - Homework 1

Due: Oct 9, 11:59pm.

This homework should be able to be done after doing the readings assigned for the first lecture. This assignment is borrowed from Milo Martin's class at UPenn (source).

Instructions:

This is an individual work assignment. Sharing of answers or code is strictly prohibited. For the quantitative questions, show your work to document how you came up with your answers.

Submission:

Turn in an electronic copy in PDF format to the Catalyst dropbox, or a neatly written paper copy to Brandon's CSE mailbox by the end of the due date.

Problems:

  1. Performance and CPI. Assume a typical program has the following instruction type breakdown:

    Assume the current-generation processor has the following instruction latencies:

    If for the next-generation design you could pick one type of instruction to make twice as fast (half the latency), which instruction type would you pick? Why?

  2. Averaging. Assume that a program executes one branch every 4 instructions for the first 1M instructions and a branch every 8 instructions for the next 2M instructions. What is the average number instructions per branch for the entire program?

  3. Amdahl's Law. A program has 10% divide instructions. All non-divide instructions take one cycle. All divide instructions take 20 cycles.

    1. What is the CPI of this program on this processor?
    2. What percent of time is spent just doing divides?
    3. What would the speedup be if we sped up divide by 2x?
    4. What would the speedup be if we sped up divide by 5x?
    5. What would the speedup be if we sped up divide by 20x?
    6. What would the speedup be if divide instructions were infinitely fast (zero cycles)?
  4. Performance and ISA. Chip A executes the ARM ISA and has a 2.5Ghz clock frequency. Chip B executes the x86 and has a 3Ghz clock frequency. Assume that on average, programs execute 1.5 times as many ARM instructions than x86 instructions.

    1. For Program P1, Chip A has a CPI of 2 and Chip B has a CPI of 3. Which chip is faster for P1? What is the speedup for Program P1?
    2. For Program P2, Chip A has a CPI of 1 and Chip B has a CPI of 2. Which chip is faster for P2? What is the speedup for Program P2?
    3. Assuming that Programs P1 and P2 are equally important workloads for the target market of this chip, which chip is "faster"? Calculate the average speedup.
  5. ISA Modification and Performance Metrics. You are the architect of a new line of processors, and you have the choice to add new instructions to the ISA if you wish. You consider adding a fused multiply/add instruction to the ISA (which is helpful in many signal processing and image processing applications). The advantage of this new instruction is that it can reduce the overall number of instructions in the program by converting some pairs of instructions into a single instruction. The disadvantage is that its implementation requires extending the clock period by 5% and increases the CPI by 10%.

    1. Calculate under what conditions adding this instruction would lead to an overall performance increase.
    2. Discuss qualitatively the impact this proposed change could have on energy consumption, complexity of processor design, and code generation by compilers.