Parallel Performance

1/21/99


Click here to start


Table of Contents

Parallel Performance

The Goal

Recall The CTA Parallel Machine Model

Allocating Processors To a Computation

ZPL Assumes Many Pts/Proc

Implications For Array Allocation

Fundamental Fact of ZPL Allocation

1Pt/Proc vs nPt/Proc

Does It Make Any Real Difference?

Knowing How ZPL Performs

Rules Of Operation II

Rules Of Operation III

Analyzing Jacobi Iteration

Analysis

WYSIWYG Performance

Reconsider Details of @ Communication

@ Comm In The CTA

Is This Simplistic Model Accurate?

Contrary View: Model Accurately

Analyzing The Bounding Box

Compiler Optimizations

Recall The 8-Connected Components

Compiler Basics

Annotate According to Rules

Revised Solution

Tale Of Two Multiplies

Consider The Product Loops

Conclusions From Analysis ...

Preparing For Algorithm Design

Flooding Is A Powerful Abstraction

Improvement I

Improvement II

Performance of Modes

A General Idea

Sorting By PSP

Applying PSP to MM ...

Matrix Multiplication Performance

Recall VQ Compression Loop

Very Parallel VQ Solution

Compiling A Portable/Efficient Language

Msg Passing: Lowest Common Denominator

Ironman: Compiler Comm Interface

HW Customize: Binding Ironman Calls

Ironman Summary ...

ZPL In Serious Computations

Summary

Author: Snyder

Email: snyder@cs.washington.edu

Home Page: http://www.cs.washington.edu/education/courses/596/CurrentQtr/

Other information:
CSE 596: Parallel Computation

Download presentation source