Computing t ´ t block
•What is the logic for computing a t ´ t block?
for (r=0; r < t; r++){
for (s=0; s < t; s++){
c[r][s] = 0.0;
for (k=0; k < n; k++){
c[r][s] +=
a[r][k]*b[k][s];
}
}
}
Loop is easy to analyze and
“unroll” Branch
prediction should work well This code may be near “optimal”