From: Greg Green (ggreen_at_cs.washington.edu)
Date: Tue Feb 17 2004 - 22:16:48 PST
Two algorithms for sharing virtual memory on loosely couple
multiprocessors is described. One algorithm uses a central manager and
the other uses distributed management. The two major design choices
for memory coherence are the size of the memory units managed and the
strategy for maintaining coherence. Some discussion is made about
choosing a page size, settling on the page size of contemporary
computers, 256-2k bytes. The next section has a short discussion of
page synchronization, dismissing the writeback approach and settling
on invalidation. Finally the different types of ownership strategies
are discussed. One major strategy is static ownership, where a page is
always owned by the same processor. The other is dynamic ownership,
where the owner of the page is the last to write to it.
The algorithms are described as fault handlers: read fault, read
server, write fault, and write server. First the centralized manager
algorithm is described, where a manager resides on a processor with
locked data structures. This algorithm doesn't require broadcasting,
as the manager always knows where all copies of the page reside, and
can invalidate them at once. An improvement to this algorithm moves
the synchronization of page ownership to the individual owners. This
removes a confirmation message from each fault. This algorithm suffers
from a bottleneck at the manager, which must respond to every fault.
The second set of algorithms described have distributed the management
over the whole set of processors. The first variation has page
ownership distributed statically to the set of processors, with a
manager at each processor for it's pages. The second
variation uses a broadcast to find the owner of a page, instead of
having the pages assigned statically beforehand. This makes the
communication the bottleneck. The final set of algorithms described
has each processor keep track of who owns which page locally. There
are some variations of this algorithm so that broadcast is not needed,
or to improve performance by broadcasting at certain intervals. There
is some discussion on the big O performance and limits. Basically the
worst case number of messages for finding a page is O(n+mlogn) for m < n
and
O(Mlog(1+m/n)N) for m >= n where n is the number of processors.
This paper was written quite a while ago, so I assume that it has been
utilized on modern multiprocessors. The parallel computing problem is
quite difficult when the result set cannot reside on 1 processor, and
these algorithms seem to provide good performance for this case.
I liked how the design space was covered in the beginning and the
strengths and weaknesses of the different approaches were discussed. I
found it easy to understand. Also the algorithms were described in
great detail, enought so that I felt that I could implement them
without too much trouble. This was a good paper.
-- Greg Green
This archive was generated by hypermail 2.1.6 : Tue Feb 17 2004 - 22:16:53 PST