Last Week (continued)
•Different techniques illustrated --
–Decompose into independent tasks
–Pipelining
–Overlapping computation and communication
•Optimizations
–Enlarge task size, e.g. several rows/columns at once
–Improve caching by blocking
–Reorder computation to “use data once”
–Exploit broadcast communication
–
The SUMMA algorithm used all of these ideas