Computing t ´ t block
•What is the logic for computing a t ´ t block?
for (r=0; r < t; r++){
  for (s=0; s < t; s++){
      c[r][s] = 0.0;
      for (k=0; k < n; k++){
    c[r][s] += a[r][k]*b[k][s];
      }
  }
}
Loop is easy to analyze and “unroll” Branch prediction should work well  This code may be near “optimal”