From: shearerje_at_comcast.net
Date: Sat Feb 14 2004 - 16:40:36 PST
This paper addresses implementation of a virtual memory that is shared by a set of loosely coupled processors. The underlying problem of coherence is conceptually similar to the problem of cache coherence discussed in the Thread Management Alternatives paper (Anderson et. al) except at the page or segment level. One difference is that the Thread Management Alternatives paper discussed an architecture where each processor had a small cache, but they all shared a common main memory that defined “truth” as to the contents of any memory value (any disagreeing cache was wrong and needed to be invalidated), where-as this paper discusses an architecture where all of the physical memory is distributed to the processing nodes and the common main memory is purely a virtual construct. Thus, a major issue addressed by this paper is how to determine which processor’s memory contains “truth” for any one page or segment.
The paper starts with two key definitions: (1) A multiprocessor is “loosely coupled” if the physical memory is distributed, and (2) Memory is coherent if the value returned by a read operation is always the same as the value written by the most recent write operation. The paper briefly touches on the impact of the granularity of the coherence object (settling on a segment of between one-fourth and two kilobytes) and then delves into the nuances of different coherency paradigms.
Coherency itself must address two issues: page synchronization between all of the physical memory elements associated with a specific virtual segment, and page ownership. That is, which physical page defines “truth” at any given time. The paper considered page synchronization via an invalidation approach and a write-back approach, and showed convincingly that the latter would result in too much bus traffic. The problem of page ownership is much more complex. “Truth” cannot just be mapped statically to the various nodes in the system because any such mapping is sub-optimal for any distributed computations other than the one (if any) that it is specifically tailored for. But dynamic allocation opens up the whole problem of tracking who owns “truth” for any segment at any given time.
The paper considers a centralized manager (which has problems similar to a centralized “ready-queue” in the thread domain), fixed distributed management (which is similar to static allocation except it is the tracking of the ownership of truth for a segment that is fixed rather than the ownership itself), and dynamic distributed management ( now each node has to track which node is tracking the node that actually has the “truth” for the segment). Several variations on this theme were discussed, but I particularly liked the concept of tracking the “probable owner”. This is a very cool way of reducing the amount of information that needs to be distributed every time ownership of a segment changes. However, I’m not sure I buy lemma 4.1, that there are no “broken” prob_owner chains that are orphaned from the real owner, and that there are no cyclic graphs other than from the real owner to itself.
There are several issues the paper does not address at all. How loose is “loosely connected”? Do the nodes need to be within nanoseconds of each other like in the Cray 1? How about half a second apart as in a network of communication satellites in geosynchronous orbit? What happens to the “truth” held by a particular node when that node fails? Does segment ownership pass to some other node holding a read copy? How do we recover the resulting broken prob_owner graphs? This paper is only the starting point for a significant net of research ideas.
This archive was generated by hypermail 2.1.6 : Sat Feb 14 2004 - 16:40:43 PST