Welcome to ML for ML Systems, taught by Prof. Luis Ceze  with Zihao Ye as TA.
 with Zihao Ye as TA.
ML models are quickly become an integral component of how applications are built. Yet they are a different thing than most software — performance hungry, bandwidth hungry, and very fast-evolving. This lead to the need to build systems to support them — abstractions and frameworks to tame complexity and quickly adapt, compilers, programming languages and runtime systems to make efficient use of hardware resources, better communication approaches for distributed systems, etc. One important twist to this fast systems development is that optimization spaces for ML systems themselves (codegen fo ML models, systems parameter tuning, resource allocation, etc) are very large, so these systems use machine learning itself to provide effective solutions — so you read the name of the class right, it is “ML for ML systems” ;).
In this special topics class we will explore the state-of-the-art and research on ML systems, including: ML model compilers, ML training systems, ML serving systems, support for large language models serving, ML systems that span cloud and edge, resource management for ML, among others. The format is a participatory focused on paper reading, presenting and discussion, and a class project scoped and chosen by the participants.
This website will be updated throughout the quarter, so check back for the latest.
Course Overview
  - Lectures: Monday and Wednesday 3:00pm-4:20pm (Location: CSE2 271)
- Luis’ Office Hours: By appointment.
- TA Office Hours: Friday 9:30am - 10:30am (Gates 374).
- Course canvas: Link
- Course materials: Google Drive Link
Assignments
  - Read all papers (optional readings are not necessary) and submit one idea of extension or use of core papers contributions.
- Present and lead discussion of 2 papers in pairs.
- A research project on ML systems, possible ideas include.
    
      - Cost predictor for training and serving over model lifetime.
- Optimizing a new workload (e.g. AI for science).
- Resource provisioning for serving.
- On-device training/inference.
- Deploy models in new backend (e.g. browser).
 
Schedule
  
    
      | Date | Topic & Readings | HW/Notes/Slides | 
  
  
  
    | March 27 | No class (ASPLOS 2023) |  | 
  
  
    | March 29 | Introduction |  | 
  
  
    | April 3 | Model Compilation/Optimization - ML CompilersRequired Readings: 
        Presenter: Vishal Canumalla
        Optional Readings: |  | 
  
  
    | April 5 | Model Compilation/Optimization - Neural Architecture SearchRequired Readings: 
        Presenter: Chloe Yang, Yifang Chen
        Optional Readings: |  | 
  
  
    | April 10 | Model Compilation/Optimization - LLM QuantizationRequired Readings: 
        Guest: Tim Dettmers  on 4-bit fine-tuning
         
        Presenter: Sam Kaufman, Rosario Scalise
        Optional Readings: |  | 
  
  
    | April 12 | Model Compilation/Optimization - Transformers & BeyondRequired Readings: 
        Presenter: Huong Ngo, Nicholas Boren, Jaehong Min
        Optional Readings: |  | 
  
  
    | April 17 | Model Compilation/Optimization - SparsificationRequired Readings: 
        Presenter: Alan Fan, Rohith Leeladharan
        Optional Readings: |  | 
  
  
    | April 19 | Project Proposal Presentation |  | 
  
  
    | April 24 | Training Optimization - Parallelism (1)Required Readings: 
        Presenter: Bohan Liu, Mike Merrill, Aditya K Kamath
        Optional Readings: |  | 
  
  
    | April 26 | Training Optimization - On Device TrainingRequired Readings: 
        Presenter: Jason Zhang, Anoop Mysore
        Optional Readings: |  | 
  
  
    | May 1 | Training Optimization - Memory OptimizationsRequired Readings: 
        Presenter: Sam Kaufman, Bohan Liu
        Optional Readings: |  | 
  
  
    | May 3 | Training Optimization - Parallelism (2)Required Readings: 
        Presenter: Alan Fan, Nicholas Boren
        Optional Readings: |  | 
  
  
    | May 8 | Model Inference & Serving - Model ServingRequired Readings: 
        Guest: Lequn Chen  on Symphony: a new model serving system
         
        Presenter: Tapan Chugh, Vaibhav Mehrotra
        Optional Readings: |  | 
  
  
    | May 10 | Model Inference & Serving - Large Scale Inference/ServingRequired Readings: 
        Presenter: Khurshid Alam, Rashmika Reddy
        Optional Readings: |  | 
  
  
    | May 15 | Model Inference & Serving - LLM Inference/ServingRequired Readings: 
        Guest: Lequn Chen  on batching effects in GPT models
         
        Presenter: Daksh Sinha, Huong Ngo
        Optional Readings: |  | 
  
  
    | May 17 | AI Hardware - TPURequired Readings: 
        Presenter: Tapan Chugh, Jaehong Min
        Optional Readings: |  | 
  
  
    | May 22 | AI Hardware - GPU & Reconfigurable ArchitecturesRequired Readings: 
        Guest: Ying Sheng  on FlexGen: an LLM inference system
         
        Presenter: Fengqing Jiang, Yun-Chang Teng, Aditya K Kamath
        Optional Readings: |  | 
  
  
    | May 24 | Project Presentation |  | 
  
  
    | May 29 | No class (Memorial Day) |  | 
  
  
    | May 31 | Project Presentation |  |