- Museum (this Wednesday)
- No class \*next\* Wednesday
- Data center visit on the 24th (more details to come)
- Midterm II May 31st
- 2 more class days :(

## GPUs



IBM monochrome adapter (1981) w/parallel printer port :)

Characters and raster mode



ATI (now AMD) EGA Wonder (1987) 2D Graphics

ATI VGA Wonder 1988



S3 Trio (1995) 3D Graphics



3Dfx 1996



Nvidia 1997



1998 Note fans!



3Dfx Voodoo 5 1999 Note fans & auxiliary power connector



## Using Modern Graphics Architectures for General-Purpose Computing: A Framework and Analysis

Chris J. Thompson Sahngyun Hahn Mark Oskin

Department of Computer Science and Engineering

University of Washington

{cthomp, syhahn, oskin}@cs.washington.edu



#### Winter 2002, published Fall 2002

**Table 3.** The vertex program instruction set.

| Opcode | Description               |
|--------|---------------------------|
| ARL    | Address register load     |
| MOV    | Move                      |
| MUL    | Multiply                  |
| ADD    | Add                       |
| SUB    | Subtract                  |
| MAD    | Multiply and add          |
| ABS    | Absolute value            |
| RCP    | Reciprocal                |
| RCC    | Reciprocal (clamped)      |
| RSQ    | Reciprocal square root    |
| DP3    | 3-component dot product   |
| DP4    | 4-component dot product   |
| DPH    | Homogenous dot product    |
| DST    | Cartesian distance        |
| MIN    | Minimum                   |
| MAX    | Maximum                   |
| SLT    | Set on less than          |
| SGE    | Set on greater/equal than |
| EXP    | Exponential base 2        |
| LOG    | Logarithm base 2          |
| LIT    | Light coefficient formula |



Figure 2. A programmable graphics pipeline.









A6 A9X



Apple A9X





(the poorly named) Intel MIC (2010)



# What are the key enabling technologies behind GPUs?

- Programmable pipeline
- Abstract ISA / API
- It's all about the memory
- High bandwidth PCIe is key for GPGPU
- Why can they use SIMD?
  - it is data parallel computation
  - the control flow is largely the same
- Need a lot of parallel tasks

Figure 3: GCN Compute Unit



4 CU Shared 32KB Instruction L1 Cache

L2 Cache

Figure 4: Local Data Share (LDS)



Figure 6: Cache Hierarchy



Figure 7: AMD Radeon™ HD 7970







### What next?