Review of "Memory Coherence in Shared Virtual Memory Systems"

From: Jeff Duzak (jduzak_at_exchange.microsoft.com)
Date: Wed Feb 18 2004 - 09:41:11 PST

  • Next message: Richard Jackson: "Review: Kai Li and Paul Hudak. Memory Coherence in Shared Virtual Memory Systems."

    This paper discusses some strategy options for the implementation of a virtual memory on a loosely coupled multiprocessor. A loosely coupled multiprocessor is defined as a system in which each processor has its own physical memory which maps into the virtual memory space. The problem is to ensure that the value read at a given address is the value most recently written to that address.
     
    The paper briefly discusses the issue of granularity of a coherence scheme, and decides that the physical page size is a reasonable choise as the smallest unit of memory coherence. The paper then talks briefly about memory coherence strategies, and quickly rules out the writeback method to page synchronization as well as static ownership of pages. Our options are therefore limited to using the invalidation method with dynamic ownership of pages.
     
    Within that space, there are a few options related to the method by which the owner of a page is found. The first option is to have a central manager keep track of ownership. The alternative is to distribute ownership information among processors. Within the distributed solution, there is the further choice between having each processor responsible for tracking ownership of a fixed subset of the memory space, or having ownership information distributed dynamically between processors.
     
    Pseudo-source code is given for two versions of the central manager method and for three versions of the dynamic distributed ownership method. This code confused me on the question of whether ownership was transferred upon a read fault. I would assume that it is not, and the following code in the read fault handler would seem to confirm that assumption:
     
    ptable[p].prob_owner := reply_node;
     
    That code indicates that the node from which the page is received is still the owner. However, the read server has the following code:
     
    ptable[p].copy_set := {};
    ptable[p].prob_owner := request_node;
     
    This code would seem to indicate that ownership is being transferred to the processor that requested to read the page.
     
    Performance of the various solutions was analyzed, and, not surprisingly, the dynamic distributed system had the best performance. Clearly, lock and communication contention are the biggest bottlenecks in this virtual memory system, just as in other multiprocessor systems such as the multiprocessor thread systems described by Anderson, Lazowska, and Levy.
     
    No discussion is given of recovery after a node failure. I believe both the prob_owner method of finding ownership and the tree method of distributing copy_sets would break with a single node failure.
     
    The system described in this paper seems very similar to Opal, in that a single address space is shared among many machines.


  • Next message: Richard Jackson: "Review: Kai Li and Paul Hudak. Memory Coherence in Shared Virtual Memory Systems."

    This archive was generated by hypermail 2.1.6 : Wed Feb 18 2004 - 09:41:11 PST