A Very Parallel Solution ...
Each cij is computed in parallel such that
- One processor dedicated to each a[i][k]*b[k][j]
- Addition tree computes sum of those products
How many processors running concurrently?
Is this solution even remotely practical?
- Data access -- conflicts/transit time/resources
- Computation time vs communication time
- Processor demands -- n3 procs for n2 results