•Architectures
differ, but row and column broadcasts are often fast
•Transfer
only the segment of row stored locally to the processors in the
column
–For 1
block Puv is a sender
–For P1/2-1
blocks Puv is a
receiver
–Space
required is only 4t elements -- 2t for the segments being processed and 2t for
the segments
arriving