Welcome to ML for ML Systems, taught by Prof. Luis Ceze with Zihao Ye as TA.
ML models are quickly become an integral component of how applications are built. Yet they are a different thing than most software — performance hungry, bandwidth hungry, and very fast-evolving. This lead to the need to build systems to support them — abstractions and frameworks to tame complexity and quickly adapt, compilers, programming languages and runtime systems to make efficient use of hardware resources, better communication approaches for distributed systems, etc. One important twist to this fast systems development is that optimization spaces for ML systems themselves (codegen fo ML models, systems parameter tuning, resource allocation, etc) are very large, so these systems use machine learning itself to provide effective solutions — so you read the name of the class right, it is “ML for ML systems” ;).
In this special topics class we will explore the state-of-the-art and research on ML systems, including: ML model compilers, ML training systems, ML serving systems, support for large language models serving, ML systems that span cloud and edge, resource management for ML, among others. The format is a participatory focused on paper reading, presenting and discussion, and a class project scoped and chosen by the participants.
This website will be updated throughout the quarter, so check back for the latest.
Course Overview
- Lectures: Monday and Wednesday 3:00pm-4:20pm (Location: CSE2 271)
- Luis’ Office Hours: By appointment.
- TA Office Hours: Friday 9:30am - 10:30am (Gates 374).
- Course canvas: Link
- Course materials: Google Drive Link
Assignments
- Read all papers (optional readings are not necessary) and submit one idea of extension or use of core papers contributions.
- Present and lead discussion of 2 papers in pairs.
- A research project on ML systems, possible ideas include.
- Cost predictor for training and serving over model lifetime.
- Optimizing a new workload (e.g. AI for science).
- Resource provisioning for serving.
- On-device training/inference.
- Deploy models in new backend (e.g. browser).
Schedule
Date |
Topic & Readings |
HW/Notes/Slides |
March 27 |
No class (ASPLOS 2023)
|
|
March 29 |
Introduction
|
|
April 3 |
Model Compilation/Optimization - ML Compilers
Required Readings:
Presenter: Vishal Canumalla
Optional Readings:
|
|
April 5 |
Model Compilation/Optimization - Neural Architecture Search
Required Readings:
Presenter: Chloe Yang, Yifang Chen
Optional Readings:
|
|
April 10 |
Model Compilation/Optimization - LLM Quantization
Required Readings:
Guest: Tim Dettmers on 4-bit fine-tuning
Presenter: Sam Kaufman, Rosario Scalise
Optional Readings:
|
|
April 12 |
Model Compilation/Optimization - Transformers & Beyond
Required Readings:
Presenter: Huong Ngo, Nicholas Boren, Jaehong Min
Optional Readings:
|
|
April 17 |
Model Compilation/Optimization - Sparsification
Required Readings:
Presenter: Alan Fan, Rohith Leeladharan
Optional Readings:
|
|
April 19 |
Project Proposal Presentation
|
|
April 24 |
Training Optimization - Parallelism (1)
Required Readings:
Presenter: Bohan Liu, Mike Merrill, Aditya K Kamath
Optional Readings:
|
|
April 26 |
Training Optimization - On Device Training
Required Readings:
Presenter: Jason Zhang, Anoop Mysore
Optional Readings:
|
|
May 1 |
Training Optimization - Memory Optimizations
Required Readings:
Presenter: Sam Kaufman, Bohan Liu
Optional Readings:
|
|
May 3 |
Training Optimization - Parallelism (2)
Required Readings:
Presenter: Alan Fan, Nicholas Boren
Optional Readings:
|
|
May 8 |
Model Inference & Serving - Model Serving
Required Readings:
Guest: Lequn Chen on Symphony: a new model serving system
Presenter: Tapan Chugh, Vaibhav Mehrotra
Optional Readings:
|
|
May 10 |
Model Inference & Serving - Large Scale Inference/Serving
Required Readings:
Presenter: Khurshid Alam, Rashmika Reddy
Optional Readings:
|
|
May 15 |
Model Inference & Serving - LLM Inference/Serving
Required Readings:
Guest: Lequn Chen on batching effects in GPT models
Presenter: Daksh Sinha, Huong Ngo
Optional Readings:
|
|
May 17 |
AI Hardware - TPU
Required Readings:
Presenter: Tapan Chugh, Jaehong Min
Optional Readings:
|
|
May 22 |
AI Hardware - GPU & Reconfigurable Architectures
Required Readings:
Guest: Ying Sheng on FlexGen: an LLM inference system
Presenter: Fengqing Jiang, Yun-Chang Teng, Aditya K Kamath
Optional Readings:
|
|
May 24 |
Project Presentation
|
|
May 29 |
No class (Memorial Day)
|
|
May 31 |
Project Presentation
|
|