Welcome to ML for ML Systems, taught by Prof. Luis Ceze A portrait of Luis with Zihao Ye as TA.

ML models are quickly become an integral component of how applications are built. Yet they are a different thing than most software — performance hungry, bandwidth hungry, and very fast-evolving. This lead to the need to build systems to support them — abstractions and frameworks to tame complexity and quickly adapt, compilers, programming languages and runtime systems to make efficient use of hardware resources, better communication approaches for distributed systems, etc. One important twist to this fast systems development is that optimization spaces for ML systems themselves (codegen fo ML models, systems parameter tuning, resource allocation, etc) are very large, so these systems use machine learning itself to provide effective solutions — so you read the name of the class right, it is “ML for ML systems” ;).

In this special topics class we will explore the state-of-the-art and research on ML systems, including: ML model compilers, ML training systems, ML serving systems, support for large language models serving, ML systems that span cloud and edge, resource management for ML, among others. The format is a participatory focused on paper reading, presenting and discussion, and a class project scoped and chosen by the participants.

This website will be updated throughout the quarter, so check back for the latest.

Course Overview

Assignments

Schedule

Date Topic & Readings HW/Notes/Slides
March 27

No class (ASPLOS 2023)

March 29

Introduction

April 3

Model Compilation/Optimization - ML Compilers

Required Readings:

Presenter: Vishal Canumalla

Optional Readings:
April 5

Model Compilation/Optimization - Neural Architecture Search

Required Readings:

Presenter: Chloe Yang, Yifang Chen

Optional Readings:
April 10

Model Compilation/Optimization - LLM Quantization

Required Readings:

Guest: Tim Dettmers on 4-bit fine-tuning

Presenter: Sam Kaufman, Rosario Scalise

Optional Readings:
April 12

Model Compilation/Optimization - Transformers & Beyond

Required Readings:

Presenter: Huong Ngo, Nicholas Boren, Jaehong Min

Optional Readings:
April 17

Model Compilation/Optimization - Sparsification

Required Readings:

Presenter: Alan Fan, Rohith Leeladharan

Optional Readings:
April 19

Project Proposal Presentation

April 24

Training Optimization - Parallelism (1)

Required Readings:

Presenter: Bohan Liu, Mike Merrill, Aditya K Kamath

Optional Readings:
April 26

Training Optimization - On Device Training

Required Readings:

Presenter: Jason Zhang, Anoop Mysore

Optional Readings:
May 1

Training Optimization - Memory Optimizations

Required Readings:

Presenter: Sam Kaufman, Bohan Liu

Optional Readings:
May 3

Training Optimization - Parallelism (2)

Required Readings:

Presenter: Alan Fan, Nicholas Boren

Optional Readings:
May 8

Model Inference & Serving - Model Serving

Required Readings:

Guest: Lequn Chen on Symphony: a new model serving system

Presenter: Tapan Chugh, Vaibhav Mehrotra

Optional Readings:
May 10

Model Inference & Serving - Large Scale Inference/Serving

Required Readings:

Presenter: Khurshid Alam, Rashmika Reddy

Optional Readings:
May 15

Model Inference & Serving - LLM Inference/Serving

Required Readings:

Guest: Lequn Chen on batching effects in GPT models

Presenter: Daksh Sinha, Huong Ngo

Optional Readings:
May 17

AI Hardware - TPU

Required Readings:

Presenter: Tapan Chugh, Jaehong Min

Optional Readings:
May 22

AI Hardware - GPU & Reconfigurable Architectures

Required Readings:

Guest: Ying Sheng on FlexGen: an LLM inference system

Presenter: Fengqing Jiang, Yun-Chang Teng, Aditya K Kamath

Optional Readings:
May 24

Project Presentation

May 29

No class (Memorial Day)

May 31

Project Presentation