CSE596 Final Exam The following conditions apply to this exam: (a) Answer all questions. (b) Prepare your answers for electronic submission. Submissions will be accepted in plain text, html, or postscript only. (c) You have 4 hours to take the exam. It is roughly a 2 hr. exam. (d) Maria and Larry will be answering email questions on the schedule: 11:00-12:00 Saturday-Monday 4:00-5:00 Friday-Sunday (e) Email questions should be sent to both of us, NOT THE CLASS. (f) At the end of four hours, turn in your exam using the turn-in (turnin -p exam) The questions below are generally not answered in lectures, but will require a bit of thinking based on what was said in lecture. The numbers in brackets indicate the relative weight of the questions. 1. [4] Among the programming approaches mentioned in this class (shared memory, message passing, ZPL), only ZPL used the CTA. In a paragraph explain the advantages and disadvantages to programmer's of ZPL's use of the CTA. 2. [9] The book repeatedly states or implies that message passing is more expensive than (coherent) shared memory because the latter is directly supported in hardware. But any mechanism can be supported in hardware. Challenge the book's assertion by comparing and contrasting the performance of the three mechanisms: distributed coherent shared memory, message passing, and "one sided" communication such as Cray's shmem. [Hint: Describe the interprocessor communication needed to implement a generic version of each mechanism, and then rank them least-to-most expensive.] 3. [6] Of the matrix multiplication algorithms presented, three were cited as efficient: Cannon's, SUMMA and the 3D (PSP) algorithm. Though they all perform the same floating point operations, they get better in the order given. Explain why by giving the (performance) advantages and disadvantages of the three. 4. [8] Short Answer [A sentence or two should suffice.]: (a) In network routing what is livelock? (b) Livelock is not a problem for an oblivious router. Why? (c) Livelock is not a problem for a minimal adaptive router. Why? (d) Livelock is a problem for non-minimal adaptive routers. What is the Chaos solution? 5. [6] Short Answer [A sentence or two should suffice.]: (a) What is the principle underlying latency tolerant architectures? (b) The two basic schemes considered, blocking and interleaved, each had an advantage and a disadvantage opposite the other scheme. What were they? (i) Advantage (ii) Disadvantage 6. [4] The GCC benchmark has 22% loads and 11% stores leading to a generic instruction profile of 3 instructions followed by a memory reference. (The memory reference is generated by the third instruction; it is not a fourth instruction itself.) What is the efficiency of a 3 thread GCC on a blocking multithreaded architecture with a 4 stage pipeline and a 10 cycle memory reference latency? 7. [4] Consider the following conditions proposed as sufficient conditions for sequential consistency: -Every process issues memory requests in the order specified by the program -After a read or write operation is issued, the issuing process waits for the operation to complete before issuing its next operation. -Before a processor Pj can return a value written by another processor Pi, all operations that were performed with respect to Pi before it issued the store must also be performed with respect to Pj. Are these conditions indeed sufficient to guarantee sequential consistency executions? If so, say why. If not, construct a counterexample, and say why the conditions discussed in lecture (lecture 5, slide 13) are sufficient in that case. 8. [2] As miss latencies increase, does an update protocol become more or less preferable as compared to an invalidate protocol? Explain.