acks for itself.

+'s and -'s of the protocol in FLASH over Origin:
+ An invalidate message only needs to include the address of the block, not
  the number of the processor making the request, since shared processors
  already know the home processor.
+ The home does not need to send the initial message to the requestor with
  a count of sharing processors.  It can just send a message to the
  requestor once it has all the acks.
- The home has to send a message to the requestor after all the acks,
  increasing the length of the critical path.
- The home gets tied up receiving and counting acks for another processor,
  possibly making 2 processors wait for all the acks, rather than just the
  requestor.
- If multiple processors are requesting pages from the same home node, it
  will have to track acks for all of them, rather than a requestor only
  having to track acks for itself.


8.11)

With an update protocol, a message must be sent to all sharers with the new
value.  This means a message cannot be sent until the new value is known.
With an invalidate protocol, you can invalidate other processors and know
it is safe to do a write when the invalidation is complete.  With an update
protocol, you do a write, then send an update.  Some other processor may
try to use the block after you've written, but before it's received the
update.  On a bus based system, everyone sees the update on the bus at the
same time, so it is not a problem.

One way to solve this is to mark the data as 'volatile' following an update,
while the home node sends updates to all sharing processes.  Once the home
node has an ack from everyone, it sets the data as no longer 'volatile' and
alerts all sharers of this.

8.16)

Say a processor receives a response and a request at the same time.  Here's
how it would behave:

Handling the request first:
-Deal with the request.
-Send out a response.  (This allows the requestor to get on with it's work
sooner.)
-Deal with the response.
-Get on with life.

Handling the response first:
-Deal with the response.
-Deal with the request.
-Send out a response.
-Get on with life.

In both cases, it takes just as long for the local processor to get on with
life, but when handling the request before the response, the requestor can
get on with things faster.

If responses are starved, it may never be able to commit a write, for
example.  Then processes requesting the written block would not see the
updated value.

One might choose to invert the priority if a response has been there for
a certain amount of time, or if a certain number of requests that arrived
after the response have already been handled, or if more than some threshold
of responses were queued up.  There are many other ways as well.


8.18)

-Assuming space in memory is not a constraining factor.
-Assuming communication at the additional cost of 4 is required on write
misses to replicated pages and these writes use an update protocol.

A)
M0 holds X and Z
M1 holds Y
Cost:  Page X => 14 local misses + 11 remote misses
       Page Y => 18 local misses
       Page Z => 15 local misses + 9 remote misses
Total: 1*(14+18+15) + 4*(11+9) = 127

B)
Replicate X.  Migrate Y.
M0 holds X, Y and Z.
M1 holds X and Y.
Cost:  Page X => 25 local misses
       Page Y => 18 local misses
       Page Z => 15 local misses + 9 remote misses
Total: 1*(25+18+15) + 4*(9) = 94

C)
Replicate X.  Migrate Y.
M0 holds X, Y and Z.
M1 holds X and Y.
Cost:  Page X => 25 local misses
       Page Y => 18 local misses
       Page Z => 15 local misses + 9 remote misses
Total: 1*(25+18+15) + 4*(9) +2*10 = 114

D)
M0 holds X, Y and Z.
Cost:  Page X => 14 local misses + 11 remote misses
       Page Y => 18 remote misses
       Page Z => 15 local misses + 9 remote misses
Total: 1*(14+15) + 4*(11+18+9) = 181

E)
Replicate X.  Migrate Y.
M0 holds X, Y and Z.
M1 holds X and Y.
Cost:  Page X => 250 local misses
       Page Y => 180 local misses
       Page Z => 150 local misses + 90 remote misses
Total: 1*(250+180+150) + 4*(90) +2*60 = 1060