Transformers have revolutionized ML. But to understand how and whether current models can continue to scale, we need to study both how models are designed and the infrastructure that supports them. This class will cover state-of-the-art techniques for pretraining and post-training, with a focus on deployed case studies.
Links
Course staff (email both for course-related issues):
Course materials
-
Ed: for paper discussion threads and general discussion
-
Gradescope
Papers
This course will involve a lot of paper reading and discussion. To get you used to reading and presenting papers, we'll ask you to sign up for two presentations during the quarter. One is the "Main" presentation, which you can think of as covering the paper itself. The other is the "Impact" presentation, which you can think of as analyzing the impact of the paper, especially in the context of what happened after the paper was published. We'll post the signup sheet and a slides template on Ed that you can use to make these presentations.
In addition to these paper presentations, you must also participate in discussion threads on Ed for each paper that we cover in class. This includes any paper that isn't listed as optional on the calendar, excluding the first lecture's paper. The staff will create a discussion thread per paper. For each paper, write 1-2 sentences about your biggest takeaway from the paper. These comments are due before the class in which we discuss the paper.
Here is the general outline:
Main presentation (20min):
-
What problem are they trying to solve?
-
Background: What was the status quo and why is it not enough?
-
Impact: What would solving this problem enable?
-
Challenge: What makes the problem challenging?
-
What is the solution?
-
Key insight: Why does the solution work?
-
Design: What are the technical details?
-
Challenge: What makes the solution challenging?
-
Evaluation: How well does the system meet its claims?
Impact presentation (5min): What was the actual impact of this work?
-
Advocate: What worked? What stood the test of time?
-
Skeptic: What didn't work or didn't last? What are the limitations?
-
Context: What has changed since this work was published?
-
Future: What are the open questions?
Assignments
We will have three assignments, plus a homework 0 just to get used to the submission process.
The goal of each assignment is to build the corresponding system from the ground up.
Note that each assignment depends on the previous!
- Homework 1: Transformers and performance profiling. Implementation and profiling for a transformer training loop, including the model architecture and the optimizer.
- Homework 2: Scaling model training with fully sharded data parallelism (FSDP).
- Homework 3: Post-training with distributed reinforcement learning (RL).
Projects
You will work in teams of 2-3 students on an open-ended project for most of the quarter, with a final writeup and presentation due at the end of the quarter.
The project must involve some kind of ML systems work, either building and/or profiling a system for ML, or applying ML to systems.
If you don't already have an idea, here are some ideas to help you get started:
-
Evaluate a system from one of the papers that we read in class, replicate its results and expand the evaluation
-
Replicate a system and results from one of the papers that we read in class
-
Add some significant feature to one of the homework assignments
-
Use the infrastructure that we build in the homework assignments to train or fine-tune a different model
Team assignments are due on 10/8 and a one-page project plan is due on 10/15.
More details on the project plan will be announced soon.
If you're not sure about the scope of your idea or need help fleshing it out, come to office hours!
Grading
-
30%: 3 individual homeworks, 10% per homework
-
40%: Open-ended project in groups of 2-3. 5% initial plan + 25% final writeup + 10% final presentation.
-
30%: Papers. 10% "Main" presentation + 10% "Impact" presentation + 10% participation on Ed.
We will allow 3 flex days for the homework assignments. You can allocate the flex days to the homework assignments as you wish. After the flex days are used, we will add 5% penalty to the individual homework score per late day. The homework assignments depend on each other, so it's important to finish these on time.
Please email the staff if you are facing difficult circumstances! We can likely make accommodations for you, just try to let us know as early as possible.