Subscribe to this calendar (Google, iCal, etc.)
| September | ||||
|---|---|---|---|---|
| Monday | Tuesday | Wednesday | Thursday | Friday |
| 22 | 23 | 24
10:00-11:20 Lecture
MGH 295 Intro and Transformers Paper (no discussion): Attention Is All You Need Slides Recording |
25
15:30-16:30 OH (Stephanie)
CSE1 580 |
26
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
23:59 HW0 due
|
| 29 | 30 | 01
10:00-11:20 Lecture
MGH 295 ML frameworks Paper 1: TensorFlow Paper 2: PyTorch 2 Optional: PyTorch Slides Recording |
02
15:30-16:30 OH (Stephanie)
CSE1 580 |
03
Guest lecture: Kan Zhu
11:30-12:30 OH (Frank)
Allen 3rd floor breakout |
| October | ||||
|---|---|---|---|---|
| Monday | Tuesday | Wednesday | Thursday | Friday |
| 06 | 07 | 08
23:59 Groups and paper signup due
|
09
15:30-16:30 OH (Stephanie)
CSE1 580 |
10
10:00-11:20 Lecture
MGH 295 Scaling I: N-D parallelism Paper 1: PyTorch DDP Paper 2: FlexFlow Slides Recording
11:30-12:30 OH (Frank)
Allen 3rd floor breakout |
| 13 | 14 | 15
Guest lecture: Frank Zhao
Guest lecture: Aashaka Shah, Roshan Dathathri
|
16
15:30-16:30 OH (Stephanie)
CSE1 580 |
17
10:00-11:20 Lecture
MGH 295 Scaling I: Memory optimizations and ZeRO Paper 1: ZeRO Paper 2: PyTorch FSDP Optional: Activation checkpointing Optional: ZeRO-Infinity Slides Recording
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
23:59 Project plan due
|
| 20 | 21 | 22
10:00-11:20 Lecture
MGH 295 Scaling I: Model parallelism Paper 1: Megatron-LM Paper 2: Scaling Megatron-LM Optional: GPipe Optional: Ring attention Optional: Zero Bubble Pipeline Parallelism Slides Recording |
23
15:30-16:30 OH (Stephanie)
CSE1 580 |
24
10:00-11:20 Lecture
MGH 295 Scaling I: Mixture-of-experts Paper 1: Sparsely-Gated MoE Paper 2: GShard Slides Recording
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
23:59 HW1 due
|
| 27 | 28 | 29
10:00-11:20 Lecture
MGH 295 Scaling I: Foundation model case studies Paper 1: PaLM, sections 1-5 Paper 2: DeepSeek-V3, sections 1-4 Slides Recording |
30
15:30-16:30 OH (Stephanie)
CSE1 580 |
31
10:00-11:20 Lecture
MGH 295 Scaling I: Foundation model case studies Paper 1: Llama3 Paper 2: TorchTitan Slides Recording
12:30-13:30 OH (Frank)
Allen 3rd floor breakout |
| November | ||||
|---|---|---|---|---|
| Monday | Tuesday | Wednesday | Thursday | Friday |
| 03 | 04 | 05
Guest lecture: Banghua Zhu
|
06
15:30-16:30 OH (Stephanie)
CSE1 580 |
07
Guest lecture: Eric Liang
10:00-11:20 Lecture
MGH 295 Post-training: Systems for RL for LLMs Paper 1: RLlib Paper 2: OpenRLHF Optional: Ray Slides - RLlib Slides - OpenRLHF Recording
11:30-12:30 OH (Frank)
Allen 3rd floor breakout |
| 10
23:59 HW2 due
|
11
Veteran's Day
|
12
Guest lecture: Shishir Patil
10:00-11:20 Lecture
MGH 295 Post-training: Systems for RL for LLMs Paper 1: HybridFlow (veRL) Slides - HybridFlow Recording |
13
16:30-17:30 OH (Stephanie)
CSE1 580 |
14
11:30-12:30 OH (Frank)
Allen 3rd floor breakout |
| 17 | 18 | 19 | 20
15:30-16:30 OH (Stephanie)
CSE1 580 |
21
10:00-11:20 Lecture
MGH 295 Scaling II: Deployment Paper 1: Power stabilization Paper 2: ByteRobust Optional: Semianalysis: H100 vs GB200 (link TBA)
11:30-12:30 OH (Frank)
Allen 3rd floor breakout |
| 24 | 25 | 26
Project time, NO CLASS
|
27
Thanksgiving
|
28
Native American Heritage Day
|
| December | ||||
|---|---|---|---|---|
| Monday | Tuesday | Wednesday | Thursday | Friday |
| 01 | 02 | 03
10:00-11:20 Lecture
MGH 295 Scaling II: Multimodal systems Paper 1: Diffusion transformers Paper 2: Chameleon |
04
15:30-16:30 OH (Stephanie)
CSE1 580 |
05
10:00-11:20 Lecture
MGH 295 Final project presentations
11:30-12:30 OH (Frank)
Allen 3rd floor breakout
23:59 HW3 due
|
| 08 | 09 | 10
23:59 Final project writeup due
23:59 All assignments due (no grace period)
|
11 | 12 |