From: Reid Wilkes (reidw_at_microsoft.com)
Date: Tue Feb 17 2004 - 23:07:26 PST
The paper by Li and Hudak proposes a virtual memory system which spans
loosely - coupled microprocessors. The authors define loosely - coupled
microprocessors as microprocessors which each have their own private
memory. It is not clear if this is a distinct classification from a set
of independent networked computers, but it does appear that kind of
configuration is a kind of loosely-coupled systems since that is the
sort of configuration the authors implemented the working models on. I
also had a little trouble understanding exactly the set of benefits that
this kind of "distributed virtual memory" would provide. The authors
sort of wave their hands and say, "of course you would want this,"
without really explaining exactly why. What does become clear, however,
in reading the entire paper, is how this idea of shared virtual memory
greatly simplifies the programming model for distributed applications.
Instead of explicitly setting up message passing or RPC mechanisms to
communicate between cooperating threads of execution running on separate
nodes, all threads of execution can now communicate through memory. Thus
from a programming standpoint talking to another thread is as simple as
writing to memory. I would be interested in really understanding if this
was the primary motivation behind designing the system or if there were
other goals that the authors were trying to achieve.
The bulk of the paper describes the algorithms used to locate pages in
the distributed virtual memory system. The algorithms are implemented in
two components: a page fault handler and a server - both of which exist
on each node. The paper builds on a series of increasingly complex ideas
beginning with a centralized manager system. With a centralized manager
the page fault handlers talk to a centralized node which maintains a
master list of who owns which page. This centralized manager then lets
the owner know that another node needs the page and the owner then takes
appropriate actions such as sending a copy of the page to the requestor,
invalidating copies of the page, and possibly even transferring
ownership to the requestor. The alternative to the centralized manager
is a distributed manager. The simplest distributed manager gives to each
node a subset of the manager duties previously held by the centralized
manager. This reduces the bottleneck at the single manager. A more
sophisticated approach is to essentially not have a manager at all, but
rather let nodes coordinate amongst themselves. The simple approach in
this case is to broadcast from the fault handlers when trying to find
the node owning a page. The more complex approach (but with better
performance) is for each page in the page table on a node have a pointer
to the most likely node to own that page. If that node doesn't own the
page, then at least it will have a pointer to another node which is
likely to own the page. It is proven in the paper that in time bounded
by the number of nodes the owner will be found using this sort of
traversal.
One of the things that stood out in this paper from some of the others
we've read was its focus on algorithms. Further, the paper goes to some
length to formalize the concepts it is discussing - reminding me in some
ways of Djikstra's claims of correctness made in the THE paper. Except
that in this case the correctness proofs are actually included in the
paper. :-)
This archive was generated by hypermail 2.1.6 : Tue Feb 17 2004 - 23:07:32 PST