Locality Can Be Improved
•Put operands in registers, “strip mine”
A
B
C
b11 b12
a11b11
a21b11
a11b12
a21b12
Switch Orientation -- By using a column of A and a row of B compute all of the “1” terms of the dot product, i.e. use 2t inputs to produce t2 first terms