From: Praveen Rao (psrao_at_windows.microsoft.com)
Date: Wed Feb 18 2004 - 17:45:56 PST
In this paper authors discuss shared memory coherence in loosely coupled
multi-processor systems.
Authors talk about two possible strategies for page synchronization -
write back and invalidation. Write back is not considered as a write to
a shared page will cause all copies of the page to be updated, which can
become very expensive. As for page ownership authors talk about static
and dynamic ownership. Static ownership is discarded as it will be
expensive.
For dynamic ownership authors discuss various strategies -
* centralized manager,
* a centralized manager where synchronization of page ownership is done
by individual owners (thereby elimination confirmation operation to the
manager)
* Fixed distributed manager (where page ownership is statically divided
amongst processors)
* Broadcast distributed manager (where a processor manages the pages
that it owns, faulting processors send broadcast messages to find the
true owner of the page, this approach can potentially make the
communication subsystem the bottleneck)
* A dynamic distributed manager (where the ownership of each page is
tracked in each processor's local ptable, this reduces network traffic
as each processor knows a probable owner of the page, which is the
actual owner in most cases)
* A dynamic distributed manager with fewer broadcast (this algorithm
improves upon the previous one by enforcing a broadcast message after K
faults to the page)
I liked the approach in the paper describing each algorithm and
improving the algorithms incrementally.
In the end authors discuss experimental results which verify that
dynamic distributed manager algorithm generates lowest number of faults
and generates much fewer broadcast messages (since probable owner is
mostly the actual owner of the page). With this algorithm 3-D PDE
problem gets a linear (actually better than linear due to reduced disk
paging) speedup, and matrix multiplication gets near linear speedup as
the number of processors increases. Merge-split sort puts it in
perspective though, which does not get such a linear speedup. Authors
blame that on the nature of the problems (details of which I don't
understand and they haven't clarified).
Authors conclude with reaffirming the practicality of such a loosely
couples shared memory system.
This archive was generated by hypermail 2.1.6 : Wed Feb 18 2004 - 17:46:06 PST