The
reasoning bringing us to this point is:
Model
of computation: We cannot write fast programs without having some idea of how
they will
perform when they execute
CTA
Shared
memory (PRAM) seems like a natural programming generalization of sequential computation, but
It
hides performance-critical info (= locality) at log cost
Concurrency
on shared memory is complex
Coherent
shared memory OK for SMP,
but beyond???
Only a global view of the computation is
required
Invent
new abstractions for a global view
ZPL