CSE 599o Autumn 2025
Course Calendar

Subscribe to this calendar (Google, iCal, etc.)

Show color key Show color key
September
MondayTuesdayWednesdayThursdayFriday
22 23 24
10:00-11:20 Lecture
MGH 295
Intro and Transformers
Paper (no discussion): Attention Is All You Need

Slides
Recording
25
15:30-16:30 OH (Stephanie)
CSE1 580
26
10:00-11:20 Lecture
MGH 295
Autodifferentation

Slides
Recording
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
23:59 HW0 due
29 30 01
10:00-11:20 Lecture
MGH 295
ML frameworks
Paper 1: TensorFlow
Paper 2: PyTorch 2
Optional: PyTorch

Slides
Recording
02
15:30-16:30 OH (Stephanie)
CSE1 580
03
Guest lecture: Kan Zhu
10:00-11:20 Lecture
MGH 295
GPU architecture and programming

Slides
Recording
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
October
MondayTuesdayWednesdayThursdayFriday
06 07 08
10:00-11:20 Lecture
MGH 295
ML compilers
Paper 1: TVM
Paper 2: Triton

Slides
Recording
23:59 Groups and paper signup due
09
15:30-16:30 OH (Stephanie)
CSE1 580
10
10:00-11:20 Lecture
MGH 295
Scaling I: N-D parallelism
Paper 1: PyTorch DDP
Paper 2: FlexFlow

Slides
Recording
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
13 14 15
Guest lecture: Frank Zhao
Guest lecture: Aashaka Shah, Roshan Dathathri
10:00-11:20 Lecture
MGH 295
Scaling I: GPU communication
Paper: MSCCL++

Slides
Recording
16
15:30-16:30 OH (Stephanie)
CSE1 580
17
10:00-11:20 Lecture
MGH 295
Scaling I: Memory optimizations and ZeRO
Paper 1: ZeRO
Paper 2: PyTorch FSDP
Optional: Activation checkpointing
Optional: ZeRO-Infinity

Slides
Recording
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
20 21 22
10:00-11:20 Lecture
MGH 295
Scaling I: Model parallelism
Paper 1: Megatron-LM
Paper 2: Scaling Megatron-LM
Optional: GPipe
Optional: Ring attention
Optional: Zero Bubble Pipeline Parallelism

Slides
Recording
23
15:30-16:30 OH (Stephanie)
CSE1 580
24
10:00-11:20 Lecture
MGH 295
Scaling I: Mixture-of-experts
Paper 1: Sparsely-Gated MoE
Paper 2: GShard

Slides
Recording
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
23:59 HW1 due
27 28 29
10:00-11:20 Lecture
MGH 295
Scaling I: Foundation model case studies
Paper 1: PaLM, sections 1-5
Paper 2: DeepSeek-V3, sections 1-4

Slides
Recording
30
15:30-16:30 OH (Stephanie)
CSE1 580
31
10:00-11:20 Lecture
MGH 295
Scaling I: Foundation model case studies
Paper 1: Llama3
Paper 2: TorchTitan

Slides
Recording
12:30-13:30 OH (Frank)
Allen 3rd floor breakout
November
MondayTuesdayWednesdayThursdayFriday
03 04 05
Guest lecture: Banghua Zhu
10:00-11:20 Lecture
MGH 295
Post-training: Intro
Paper: Tulu 3

Slides - Tulu 3
Recording
06
15:30-16:30 OH (Stephanie)
CSE1 580
07
Guest lecture: Eric Liang
10:00-11:20 Lecture
MGH 295
Post-training: Systems for RL for LLMs
Paper 1: RLlib
Paper 2: OpenRLHF
Optional: Ray

Slides - RLlib
Slides - OpenRLHF
Recording
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
10
23:59 HW2 due
11
Veteran's Day
12
Guest lecture: Shishir Patil
10:00-11:20 Lecture
MGH 295
Post-training: Systems for RL for LLMs
Paper 1: HybridFlow (veRL)

Slides - HybridFlow
Recording
13
16:30-17:30 OH (Stephanie)
CSE1 580
14
10:00-11:20 Lecture
MGH 295
Scaling II: Distributed frameworks
Paper 1: GSPMD
Paper 2: Pathways
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
17 18 19
10:00-11:20 Lecture
MGH 295
Scaling II: Data loading
Paper 1: tf.data
Paper 2: Ray Data
20
15:30-16:30 OH (Stephanie)
CSE1 580
21
10:00-11:20 Lecture
MGH 295
Scaling II: Deployment
Paper 1: Power stabilization
Paper 2: ByteRobust
Optional: Semianalysis: H100 vs GB200 (link TBA)
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
24 25 26
Project time, NO CLASS
27
Thanksgiving
28
Native American Heritage Day
December
MondayTuesdayWednesdayThursdayFriday
01 02 03
10:00-11:20 Lecture
MGH 295
Scaling II: Multimodal systems
Paper 1: Diffusion transformers
Paper 2: Chameleon
04
15:30-16:30 OH (Stephanie)
CSE1 580
05
10:00-11:20 Lecture
MGH 295
Final project presentations
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
23:59 HW3 due
08 09 10
23:59 Final project writeup due
23:59 All assignments due (no grace period)
11 12