From: Cem Paya (cemp_at_microsoft.com)
Date: Wed Feb 18 2004 - 16:34:40 PST
Review: Memory coherence in shared virtual memory systems (Kai Li, Paul
Hudak 1986)
CSE551P, Cem Paya
Virtual memory is traditionally considered meaningful within a single
system, which may contain multiple processors but remain tightly
coupled. Li and Hudak extend VM to loosely-coupled distributed system to
create the abstraction of a single universally shared memory between all
processors. This creates interesting possibilities such as migrating
data between processors by the new analog for "paging" where data moves
to another processor instead of to secondary storage. Paper hints at a
few interesting applications such as mobility for objects, and
substitute for RPC.
Challenge for setting up shared VM is maintaining coherency between
multiple readers/writers. Coherence guarantees that every read access
receives the value supplied in the latest write access. This reduces to
a version of readers/writers problem, where the constraint is a
processor can write only when nobody else is trying to read or write
i.e. multiple reads are allowed but writing requires an exclusive lock.
VM has to keep track of which processors are reading a given page, and
which single processor (if any) has write access or "owns" the page-also
implying that the proceessor has authoritative copy. Within that
framework there are many possible design choices and authors rule out
synchronization by "write-back" where every write operation gets
mirrored to all the other copies. This generates too much traffic
between processors, even when the read copies are not being actively
used. Instead they focus on invalidation with dynamic page ownership.
Rest of the paper outlines a series of algorithms, each with refinements
and improvements. Each algorithm defines logic for 4 cases: read fault,
write fault, sharing a page for read and giving up ownership of page
because another processor is trying to write to it. Simplest approach is
a central monitor where a single processor is responsible for
maintaining who owns which page. Next one moves the responsibility for
keeping track of copies to the processor that owns the page, but still
suffers from the single processor bottleneck. Next few algorithms
distribute management load to the different procesors, with each one
keeping a forwarding address or "probable owner" as the page is handed
off to another processor for write. In this model when a write fault
occurs, all read copies are invalidate with a broadcast. Clever analysis
based on the disjoint set data structure shows that number of messages
grows as N + K log N for N processors and K owners on the same page.
(Updating the probable owner to the true owner corresponds to the
path-compression heuristic-note that the other heuristic, union-by-rank
does, which reduces amortized time to almost constant per operation for
disjoint sets, does not apply here.)
Perhaps most impressive thing about this paper is that it could be made
to work in practice. Authors implemented it on a network of workstations
and looked at the scaling behavior with different # of processors.
Experiment confirmed the expectations that distributed manager
outperformed centralize done, with the difference most notable for
forwarding requests when a processor is trying to locate the correct
owner for a page. (Dynamic management requires a forward only when the
probably owner field is wrong) Even more amazing: better than linear
scaling observed on a matrix reduction problem upto 8 processors. When
the problem is parallelized each piece fits into physical memory and
avoids paging, which more than makes up for the increased overhead from
sending memory pages between processors.
This archive was generated by hypermail 2.1.6 : Wed Feb 18 2004 - 16:35:13 PST