Does locality-of-reference (or cache-obliviousness) demand a new species of program? Does a balanced schedule require the programmer's effort to choreograph independent, parallel processes? Or should we seek these attributes in a programming language, or in the body of software-engineering knowledge? Certainly they are important to high performance, but how do we get them? And how does asymptotic analysis help such design?
And why do these divers concepts gather under one umbrella?
This talk outlines a paradigm for programming solutions to some traditional matrix problems that support the supercomputing industry. Tools like Morton-order representation of matrices, block-recursive programming, divide-and-conquer scheduling, dovetailed function calls, and parameterized inline expansion combine to deliver high-performance from high-level source code. And writing it is fun.
Results presented are from the Opie Transformer (Morton-order to row-major) and the Arcee Paradigm projects at Indiana.