Consider The Product Loops
What does ZPL’s performance model tell us?
[Res] C := 0.0; -- Initialize C
for k := 1 to n do -- Thru common dim
[Res] C := C + A*B ; -- Product & accumulate
[right of Lop] wrap A; -- Send first col right
[Lop] A := A@right; -- Shift array left
[below of Rop] wrap B; -- Send top row down
[Rop] B := B@below; -- Shift array up
[Res] C := 0.0; -- Initialize C
[ ,*] Col := >>[,k] A; -- Flood kth col of A
[*, ] Row := >>[k,] B; -- Flood kth row of B
C := C+Col*Row;-- Accumulate product