CSE as AND gate University of Washington Computer Science & Engineering
 CSE P 548: Computer Architecture - Autumn 2006
  CSE Home   About Us    Search    Contact Info 

 Past Announcements
Email list

CSE P548 Schedule (Autumn 2006)

This course schedule will be updated, so check it often.
The dates for the readings indicate the day that the reading should have been read.

Basics of Computer Architecture
Architecture overview Let your eyes float over chapter 1. We won't cover this in class; but it is good for your general background in computer architecture.
Instruction set design Speedread chapter 2 (3rd edition) or Appendix B (4th edition). This is a good summary of background instruction set design material, but is more detailed than we will cover in class. Omit 2.4, 2.6, 2.11, 2.13, and 2.16 (3rd edition) or B.8 and B.12 (4th edition).
Instruction-level parallelism Read section 3.1 (3rd edition) or section 2.1 (4th edition).
Basics of pipelining Sections A.1 - A.2 (both editions) is a review of the basics of pipelining and could replace reading in the undergraduate text. Only read it if you need it.
Dynamic branch prediction Read section 4.2, section 3.4, section 3.5, pp. A-24 to A-26, and pp. 245 to 249 (3rd edition) or section 2.3, section 2.9 up to p. 127, pp. A-25 to A-26, and pp. 160 to 162
Predicated execution Read pp. 340-344, 356, and 358 (3rd edition) or section G-4, p. G-38, and p. G-40 (4th edition).
Exceptions & pipelining Read pp. A-37 to A-45 and A-54 to A-56 (both editions).
No class
NAE induction
Dynamic Execution Cores
Overview of multiple issue processors & static scheduling Read pp. 215-220 (3rd edition) or section 2.7 (4th edition). See pp. 304-312 (3rd edition) or section 2.2 (4th edition) for a discussion on loop unrolling.
Overview of dynamic scheduling Read pp. 181-184 and 220-224 (3rd edition) or pp. 89-92 and section 2.8 (4th edition).
Tomasulo's algorithm Read pp. 184-196 (3rd edition) or pp. 92-104 (4th edition).
Midterm starts at 6:30
Class starts at 7:30
Dynamic Execution Cores
R10000-style dynamic scheduling The Smith/Sohi article on superscalars.
The R10000 article. Read from Register mapping, p. 32 through Register files, p. 35.
Static Execution Cores
Software techniques to exploit ILP
We have already discussed loop unrolling. We'll briefly touch upon two other techniques on pp. 329-340 (3rd edition) or G-12 to G21 (4th edition).
VLIW machines Read pp. 315-319 (3rd edition) or pp. 114-118 (4th edition).
I've also included two supplementary papers on the IA-64. In the HP/Intel architecture paper omit the memory model, software pipelining, & floating point. In the Intel implementation paper, omit floating point again, IA-432 compatibility & machine resources per port. Both of these articles contain too much detail, but they are better than the text (section 4.7 (3rd edition) or G-6 (4th edition)). Let my lecture be your guide for what is important for us. There is also a critique by a rival which should give you a sense of how and why architects can disagree.
Basics of caches This is standard undergraduate material. You might skip the reading and just look at the slides for a review. But read pp. 390-410, 423-430 (3rd edition) or section 5.1, pp. C-1 to C-19, C-22 to C-29 (4th edition) if the slides seem incomprehensible.
Advanced caching techniques Read pp. 410-413, section 5.4, pp. 430-435, sections 5.6, 5.7 (3rd edition) or C-19 to C-21, C-29 to C-38, pp. 293-309 (4th edition).
Main memory Read sections 5.8, 5.9 to p. 457 (3rd edition) or pp. 310-312 (4th edition).
Overview of multiprocessing Read section 6.1 (3rd edition) or section 4.1 (4th edition).
Cache coherence, snooping and directory protocols Read sections 6.3 - 6.6 (3rd edition) or sections 4.2 - 4.4 (4th edition).
Synchronization Read section 6.7 (3rd edition) or section 4.5 & H.4 (4th edition). This includes slightly more than we will cover is class, so let the class notes be your guide as to what is important for us.
No class
Tera-style multithreading Read the Tera paper . Tera's runtime system (not required - this is just in case the OS/RT students are interested).
Simultaneous multithreading Read section 6.9 (3rd edition) or section 3.5 (4th edition) and the SMT paper .
Dataflow Machines
Content of the final.
Dataflow machines and WaveScalar.
After looking them over, I don't like any of the papers on the early dataflow machines. Just listen to the lecture. Read The WaveScalar Architecture for an overview of WaveScalar and Area-Performance Trade-offs in Tiled Dataflow Architectures for an implementation.
Course evaluations.
Final from 8:30pm to 10:30pm.

CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to Susan Eggers]