From: Prasanna Kumar Jayapal (prasak_at_winse.microsoft.com)
Date: Wed Feb 18 2004 - 17:39:37 PST
This paper ("Memory Coherence in Share Virtual Memory Systems") talks
about the well-known memory coherence problem on multi-processors and
discusses different protocols to solve this problem.
The performance of memory coherence depends on two issues - memory
granularity and the coherence scheme. The former depends on HW issues as
well as network latency. A coherence scheme has to deal with page
synchronization and ownership. Furthermore, one can deal with these two
issues via a centralized scheme or a distributed one.
The following three main memory coherence strategies based on
invalidation approach for page synchronization are considered in the
paper.
1. Centralized Manager algorithm: There is only one manager that
knows the owner of a page and it helps other processors locate where the
page is. Synchronization of requests may be performed by the manager or
the individual owners.
2. Fixed Distributed Manager algorithm: Here every processor is
given a predetermined subset of the pages to manage. When a page fault
occurs on a page, the faulting processor asks the manager where the page
owner is and then proceeds as the first algorithm.
3. Dynamic Distributed Manager algorithm: Here the ownership of all
pages are dynamically tracked. A field called "prob owner" indicates the
true owner or the probable owner of a page. A faulting processor sends
request to the processor indicated by the "prob owner" field for that
page. If it is the owner, it proceeds as the first algorithm, if it is
not, it forwards the request to the processor indicated by its prob
owner field.
The Centralized scheme creates bottlenecks, but provides for a simple
implementation. A distributed method allows for less contention, but
adds complexity to the overall system - one now has to locate pages.
Implementations show that dynamic distributed manager algorithm and its
variations seem to have the most desirable overall feature.
The experimental results show that the tests were accomplished on four
parallel programs. The tests show that the speedup is very good for some
classes of programs, but not for others that require data sets which are
read only once. The algorithms, however, do not include or talk about
mechanisms to recover from processor failures and message losses.
In general, it was a nice paper, well structured, had a clear goal and
discussed the topics in depth.
This archive was generated by hypermail 2.1.6 : Wed Feb 18 2004 - 17:38:56 PST