CSE596 Final Exam

The following conditions apply to this exam:
  (a)  Answer all questions.
  (b)  Prepare your answers for electronic submission.  Submissions will
         be accepted in plain text, html, or postscript only.
  (c)  You have 4 hours to take the exam.  It is roughly a 2 hr. exam.
  (d)  Maria and Larry will be answering email questions on the schedule:
	11:00-12:00   Saturday-Monday
	4:00-5:00     Friday-Sunday
  (e)  Email questions should be sent to both of us, NOT THE CLASS.
  (f)  At the end of four hours, turn in your exam using the turn-in
  	(turnin -p exam)


The questions below are generally not answered in lectures, but will
require a bit of thinking based on what was said in lecture.  The numbers in
brackets indicate the relative weight of the questions.

1. [4] Among the programming approaches mentioned in this class (shared
memory, message passing, ZPL), only ZPL used the CTA.   In a paragraph
explain the advantages and disadvantages to programmer's of ZPL's use of
the CTA.

2. [9] The book repeatedly states or implies that message passing is more
expensive than (coherent) shared memory because the latter is directly
supported in hardware.  But any mechanism can be supported in hardware.
Challenge the book's assertion by comparing and contrasting the performance
of the three mechanisms:  distributed coherent shared memory, message
passing, and "one sided" communication such as Cray's shmem.  [Hint:
Describe the interprocessor communication needed to implement a generic
version of each mechanism, and then rank them least-to-most expensive.]

3. [6] Of the matrix multiplication algorithms presented, three were cited
as efficient: Cannon's, SUMMA and the 3D (PSP) algorithm.  Though they all
perform the same floating point operations, they get better in the order
given.  Explain why by giving the (performance) advantages and
disadvantages of the three.  

4. [8] Short Answer [A sentence or two should suffice.]: 
(a) In network routing what is livelock?  
(b) Livelock is not a problem for an oblivious router.  Why?  
(c) Livelock is not a problem for a minimal adaptive router.  Why?  
(d) Livelock is a problem for non-minimal adaptive routers.  What is the
Chaos solution?

5. [6] Short Answer [A sentence or two should suffice.]:
(a) What is the principle underlying latency tolerant architectures?
(b) The two basic schemes considered, blocking and interleaved, each had
an advantage and a disadvantage opposite the other scheme.  What were they?
	(i)  Advantage
	(ii) Disadvantage

6. [4] The GCC benchmark has 22% loads and 11% stores leading to a generic
instruction profile of 3 instructions followed by a memory reference.  (The
memory reference is generated by the third instruction; it is not a fourth
instruction itself.)  What is the efficiency of a 3 thread GCC on a blocking
multithreaded architecture with a 4 stage pipeline and a 10 cycle memory
reference latency?

7. [4] Consider the following conditions proposed as sufficient conditions
for sequential consistency:
-Every process issues memory requests in the order specified by the program
-After a read or write operation is issued, the issuing process waits for
the operation to complete before issuing its next operation.
-Before a processor Pj can return a value written by another processor Pi,
 all operations that were performed with respect to Pi before it issued
 the store must also be performed with respect to Pj.
Are these conditions indeed sufficient to guarantee sequential consistency
executions?  If so, say why.  If not, construct a counterexample, and say
why the conditions discussed in lecture (lecture 5, slide 13) are
sufficient in that case.

8. [2] As miss latencies increase, does an update protocol become more or
less preferable as compared to an invalidate protocol?  Explain.