Shared Virtual Memory Paper Review

From: Reid Wilkes (reidw_at_microsoft.com)
Date: Tue Feb 17 2004 - 23:07:26 PST

  • Next message: Brian Milnes: "Li and Hudak Review"

    The paper by Li and Hudak proposes a virtual memory system which spans
    loosely - coupled microprocessors. The authors define loosely - coupled
    microprocessors as microprocessors which each have their own private
    memory. It is not clear if this is a distinct classification from a set
    of independent networked computers, but it does appear that kind of
    configuration is a kind of loosely-coupled systems since that is the
    sort of configuration the authors implemented the working models on. I
    also had a little trouble understanding exactly the set of benefits that
    this kind of "distributed virtual memory" would provide. The authors
    sort of wave their hands and say, "of course you would want this,"
    without really explaining exactly why. What does become clear, however,
    in reading the entire paper, is how this idea of shared virtual memory
    greatly simplifies the programming model for distributed applications.
    Instead of explicitly setting up message passing or RPC mechanisms to
    communicate between cooperating threads of execution running on separate
    nodes, all threads of execution can now communicate through memory. Thus
    from a programming standpoint talking to another thread is as simple as
    writing to memory. I would be interested in really understanding if this
    was the primary motivation behind designing the system or if there were
    other goals that the authors were trying to achieve.

    The bulk of the paper describes the algorithms used to locate pages in
    the distributed virtual memory system. The algorithms are implemented in
    two components: a page fault handler and a server - both of which exist
    on each node. The paper builds on a series of increasingly complex ideas
    beginning with a centralized manager system. With a centralized manager
    the page fault handlers talk to a centralized node which maintains a
    master list of who owns which page. This centralized manager then lets
    the owner know that another node needs the page and the owner then takes
    appropriate actions such as sending a copy of the page to the requestor,
    invalidating copies of the page, and possibly even transferring
    ownership to the requestor. The alternative to the centralized manager
    is a distributed manager. The simplest distributed manager gives to each
    node a subset of the manager duties previously held by the centralized
    manager. This reduces the bottleneck at the single manager. A more
    sophisticated approach is to essentially not have a manager at all, but
    rather let nodes coordinate amongst themselves. The simple approach in
    this case is to broadcast from the fault handlers when trying to find
    the node owning a page. The more complex approach (but with better
    performance) is for each page in the page table on a node have a pointer
    to the most likely node to own that page. If that node doesn't own the
    page, then at least it will have a pointer to another node which is
    likely to own the page. It is proven in the paper that in time bounded
    by the number of nodes the owner will be found using this sort of
    traversal.

    One of the things that stood out in this paper from some of the others
    we've read was its focus on algorithms. Further, the paper goes to some
    length to formalize the concepts it is discussing - reminding me in some
    ways of Djikstra's claims of correctness made in the THE paper. Except
    that in this case the correctness proofs are actually included in the
    paper. :-)

     

     


  • Next message: Brian Milnes: "Li and Hudak Review"

    This archive was generated by hypermail 2.1.6 : Tue Feb 17 2004 - 23:07:32 PST