Locality Can Be Improved
•Put operands in registers, “strip mine”
A
B
C
b11     b12
a11
 a21
a11b11
a21b11
a11b12
a21b12
Switch Orientation -- By using a column of A and a row of B compute all of the “1” terms of the dot product, i.e. use 2t inputs to produce t2 first terms