From: Gail Rahn (grahn_at_cs.washington.edu)
Date: Mon Feb 16 2004 - 16:56:06 PST
This 1986 paper assesses problems with memory coherence in shared virtual
memory systems for loosely-coupled multiprocessors, machines where the
virtual memory system virtualizes distributed physical memory. Here, a
single virtual address space is shared among the processors. Pages of data
migrated between processors. This model is extendable to perform efficient
process migration.
Coherence is the biggest problem in such a shared virtual memory system. It
must always be the case that a read operation returns the most
recently-written value of the page. Many processors can simultaneously read
the page but only one at a time can write it. The paper provides
sequentially-improving algorithms to efficiently manage reads and writes
while ensuring coherence.
The paper does not address cache-writeback style algorithms. The bandwidth
necessary to writeback updated values for each modified page is
overwhelming and necessarily invalidates the efficiency of the approach. The
paper only addresses cache-invalidation algorithms, and specifically, those
that provide dynamic allocation of "owned" pages across processors.
First, the paper looks at algorithms where many processors share one
centralized virtual memory monitor. The monitor knows which processors have
copies of a page and which single processor can write to the page. An
algorithm is presented for processors to handle read and write faults by
communicating with the central monitor for access to the memory location.
The algorithms use two messages sent from a processor for each access, a
request message and a confirmation.
The next algorithm improves on this idea by sending fewer messages -
attempting to remove the confirmation message. This approach moves
synchronization from the central monitor to the requesting processors. The
confirmation message is unnecessary because messages are forwarded by the
central authority to the memory-owning processor. Still, there is a
bottleneck for large numbers of memory accesses.
The authors improve this idea again by distributing the memory management to
each processor. A static allotment of pages to each processor is passingly
considered, but this idea is immediately recognized as not optimized for
application purposes and rejected. A broadcast-based distributed management
algorithm is also considered, but rejected for bandwidth reasons.
An algorithm is presented for a distributed virtual memory manager for
processors with dynamically-allocated pages. This algorithm relies on each
processor keeping a list of all pages of interest to the processor and the
"probable owner" of the pages. By default, the probable owner is a default
processor. Each processor then knows one other processor to query for the
page, the probable owner. There is a directed graph with no cycles of
processors to ask, in turn, for the page in memory. This forwarding
eventually ends in the page owner. In the worst case, this algorithm
generates N-1 messages to locate a page, where N is the number of
processors in the system. This algorithm can be further optimized to reduce
the number of broadcast messages necessary to discover the owner.
I liked reading this paper because it presented an initial approach to
solving the problem, then iteratively improved the algorithms until a
satisfactory solution was obtained. My learning came in the iteration,
matching my own optimization guesses with the solutions followed by the
authors or reviewing where they diverged. This paper is also good at
discussing solutions at a high-level and then presenting the algorithm in
pseudocode. That helps me to map concept to implementation.
-------------
Gail Rahn
grahn_at_cs.washington.edu
This archive was generated by hypermail 2.1.6 : Mon Feb 16 2004 - 16:56:16 PST