acks for itself. +'s and -'s of the protocol in FLASH over Origin: + An invalidate message only needs to include the address of the block, not the number of the processor making the request, since shared processors already know the home processor. + The home does not need to send the initial message to the requestor with a count of sharing processors. It can just send a message to the requestor once it has all the acks. - The home has to send a message to the requestor after all the acks, increasing the length of the critical path. - The home gets tied up receiving and counting acks for another processor, possibly making 2 processors wait for all the acks, rather than just the requestor. - If multiple processors are requesting pages from the same home node, it will have to track acks for all of them, rather than a requestor only having to track acks for itself. 8.11) With an update protocol, a message must be sent to all sharers with the new value. This means a message cannot be sent until the new value is known. With an invalidate protocol, you can invalidate other processors and know it is safe to do a write when the invalidation is complete. With an update protocol, you do a write, then send an update. Some other processor may try to use the block after you've written, but before it's received the update. On a bus based system, everyone sees the update on the bus at the same time, so it is not a problem. One way to solve this is to mark the data as 'volatile' following an update, while the home node sends updates to all sharing processes. Once the home node has an ack from everyone, it sets the data as no longer 'volatile' and alerts all sharers of this. 8.16) Say a processor receives a response and a request at the same time. Here's how it would behave: Handling the request first: -Deal with the request. -Send out a response. (This allows the requestor to get on with it's work sooner.) -Deal with the response. -Get on with life. Handling the response first: -Deal with the response. -Deal with the request. -Send out a response. -Get on with life. In both cases, it takes just as long for the local processor to get on with life, but when handling the request before the response, the requestor can get on with things faster. If responses are starved, it may never be able to commit a write, for example. Then processes requesting the written block would not see the updated value. One might choose to invert the priority if a response has been there for a certain amount of time, or if a certain number of requests that arrived after the response have already been handled, or if more than some threshold of responses were queued up. There are many other ways as well. 8.18) -Assuming space in memory is not a constraining factor. -Assuming communication at the additional cost of 4 is required on write misses to replicated pages and these writes use an update protocol. A) M0 holds X and Z M1 holds Y Cost: Page X => 14 local misses + 11 remote misses Page Y => 18 local misses Page Z => 15 local misses + 9 remote misses Total: 1*(14+18+15) + 4*(11+9) = 127 B) Replicate X. Migrate Y. M0 holds X, Y and Z. M1 holds X and Y. Cost: Page X => 25 local misses Page Y => 18 local misses Page Z => 15 local misses + 9 remote misses Total: 1*(25+18+15) + 4*(9) = 94 C) Replicate X. Migrate Y. M0 holds X, Y and Z. M1 holds X and Y. Cost: Page X => 25 local misses Page Y => 18 local misses Page Z => 15 local misses + 9 remote misses Total: 1*(25+18+15) + 4*(9) +2*10 = 114 D) M0 holds X, Y and Z. Cost: Page X => 14 local misses + 11 remote misses Page Y => 18 remote misses Page Z => 15 local misses + 9 remote misses Total: 1*(14+15) + 4*(11+18+9) = 181 E) Replicate X. Migrate Y. M0 holds X, Y and Z. M1 holds X and Y. Cost: Page X => 250 local misses Page Y => 180 local misses Page Z => 150 local misses + 90 remote misses Total: 1*(250+180+150) + 4*(90) +2*60 = 1060