CSE 599o Autumn 2025
Course Calendar

Subscribe to this calendar (Google, iCal, etc.)

Show color key Show color key
September
MondayTuesdayWednesdayThursdayFriday
22 23 24
10:00-11:20 Lecture
MGH 295
Intro and Transformers
Paper (no discussion): Attention Is All You Need
Slides
Recording
25
16:00-17:00 OH (Stephanie)
CSE1 580
26
10:00-11:20 Lecture
MGH 295
Autodifferentation
Slides
Recording
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
23:59 HW0 due
29 30 01
10:00-11:20 Lecture
MGH 295
ML frameworks
Paper 1: TensorFlow
Paper 2: PyTorch 2
Optional: PyTorch
02
16:00-17:00 OH (Stephanie)
CSE1 580
03
Guest lecture: Kan Zhu
10:00-11:20 Lecture
MGH 295
GPU architecture and programming
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
October
MondayTuesdayWednesdayThursdayFriday
06 07 08
10:00-11:20 Lecture
MGH 295
ML compilers
Paper 1: TVM
Paper 2: Triton
23:59 Groups and paper signup due
09
16:00-17:00 OH (Stephanie)
CSE1 580
10
10:00-11:20 Lecture
MGH 295
Scaling I: N-D parallelism
Paper 1: PyTorch DDP
Paper 2: FlexFlow
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
13 14 15
Guest lecture: Frank Zhao
Guest lecture: Aashaka Shah, Roshan Dathathri
10:00-11:20 Lecture
MGH 295
Scaling I: GPU communication
Paper: MSCCL++
23:59 Project plan due
16
16:00-17:00 OH (Stephanie)
CSE1 580
17
HW2 out
10:00-11:20 Lecture
MGH 295
Scaling I: Memory optimizations and ZeRO
Paper 1: ZeRO
Paper 2: PyTorch FSDP
Optional: Activation checkpointing
Optional: ZeRO-Infinity
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
23:59 HW1 due
20 21 22
10:00-11:20 Lecture
MGH 295
Scaling I: Model parallelism
Paper 1: Megatron-LM
Paper 2: Scaling Megatron-LM
Optional: GPipe
Optional: Ring attention
Optional: Zero Bubble Pipeline Parallelism
23
16:00-17:00 OH (Stephanie)
CSE1 580
24
10:00-11:20 Lecture
MGH 295
Scaling I: Mixture-of-experts
Paper 1: Sparsely-Gated MoE
Paper 2: GShard
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
27 28 29
10:00-11:20 Lecture
MGH 295
Scaling I: Foundation model case studies
Paper 1: PaLM, sections 1-5
Paper 2: DeepSeek-V3, sections 1-4
30
16:00-17:00 OH (Stephanie)
CSE1 580
31
10:00-11:20 Lecture
MGH 295
Scaling I: Foundation model case studies
Paper 1: Llama3
Paper 2: TorchTitan
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
November
MondayTuesdayWednesdayThursdayFriday
03 04 05
10:00-11:20 Lecture
MGH 295
Post-training: Intro
Paper 1: Tulu 3, sections 1-5
Paper 2: Tulu 3, sections 6-10
06
16:00-17:00 OH (Stephanie)
CSE1 580
07
Guest lecture: Eric Liang
HW3 out
10:00-11:20 Lecture
MGH 295
Post-training: Systems for RL for LLMs
Paper 1: RLlib
Paper 2: OpenRLHF
Optional: Ray
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
23:59 HW2 due
10 11
Veteran's Day
12
Guest lecture: Shishir Patil
10:00-11:20 Lecture
MGH 295
Post-training: Systems for RL for LLMs
Paper 1: HybridFlow (veRL)
13
16:00-17:00 OH (Stephanie)
CSE1 580
14
10:00-11:20 Lecture
MGH 295
Scaling II: Distributed frameworks
Paper 1: GSPMD
Paper 2: Pathways
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
17 18 19
10:00-11:20 Lecture
MGH 295
Scaling II: Data loading
Paper 1: tf.data
Paper 2: Ray Data (link TBA)
20
16:00-17:00 OH (Stephanie)
CSE1 580
21
10:00-11:20 Lecture
MGH 295
Scaling II: Hardware trends
Paper 1: Power stabilization
Paper 2: Semianalysis: H100 vs GB200 (link TBA)
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
24 25 26
Project time, NO CLASS
27
Thanksgiving
28
Native American Heritage Day
December
MondayTuesdayWednesdayThursdayFriday
01 02 03
10:00-11:20 Lecture
MGH 295
Scaling II: Multimodal systems
Paper 1: Diffusion transformers
Paper 2: Chameleon
04
16:00-17:00 OH (Stephanie)
CSE1 580
05
10:00-11:20 Lecture
MGH 295
Final project presentations
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
23:59 HW3 due
08 09 10
23:59 Final project writeup due
23:59 All assignments due (no grace period)
11 12