Parallel Performance
The Goal
Recall The CTA Parallel Machine Model
Allocating Processors To a Computation
ZPL Assumes Many Pts/Proc
Implications For Array Allocation
Fundamental Fact of ZPL Allocation
1Pt/Proc vs nPt/Proc
Does It Make Any Real Difference?
Knowing How ZPL Performs
Rules Of Operation II
Rules Of Operation III
Analyzing Jacobi Iteration
Analysis
WYSIWYG Performance
Reconsider Details of @ Communication
@ Comm In The CTA
Is This Simplistic Model Accurate?
Contrary View: Model Accurately
Analyzing The Bounding Box
Compiler Optimizations
Recall The 8-Connected Components
Compiler Basics
Annotate According to Rules
Revised Solution
Tale Of Two Multiplies
Consider The Product Loops
Conclusions From Analysis ...
Preparing For Algorithm Design
Flooding Is A Powerful Abstraction
Improvement I
Improvement II
Performance of Modes
A General Idea
Sorting By PSP
Applying PSP to MM ...
Matrix Multiplication Performance
Recall VQ Compression Loop
Very Parallel VQ Solution
Compiling A Portable/Efficient Language
Msg Passing: Lowest Common Denominator
Ironman: Compiler Comm Interface
HW Customize: Binding Ironman Calls
Ironman Summary ...
ZPL In Serious Computations
Summary
Email: snyder@cs.washington.edu
Home Page: http://www.cs.washington.edu/education/courses/596/CurrentQtr/
Other information: CSE 596: Parallel Computation
Download presentation source