Over the past decade, there have been significant advances in machine learning (ML) algorithms and models. Further aided by advances in computational power, ML algorithms have evolved to process and analyze enormous datasets efficiently. However, all of these advances have placed a considerable strain on our computing infrastructure. Training and inference of machine learning models incur significant costs and induce substantial processing delays. Understanding and optimizing the systems used for machine learning is thus crucial for unlocking its true potential. In this course, we will provide students with an in-depth understanding of the various elements of modern ML systems, ranging from the performance characteristics of ML models such as transformers, languages and compilers for machine learning, architectural support for ML computations, and distributed computing required for training and inference of large ML models. We will learn about the design rationale behind the state-of-the-art machine learning frameworks and advanced system techniques to scale models and reduce the computing, memory, and communication needs. We will focus on case studies on modern large language model (LLM) training and serving systems used in practice today.

Course Staff

Person Email Office Hours
Arvind Krishnamurthy arvind@cs Monday 11:30-12:30 (CSE 592) except on 11/4/2024
Tapan Chugh tapanc@cs Thursday 4-5pm (Allen Center: 2nd Floor Breakout Area)
Chien-Yu Lin cyulin@cs Thursday 4-5pm (Allen Center: 2nd Floor Breakout Area)

Location and Time

CSE2 271, MW 3-4:20

Feedback

We would love to hear from you! You can reach us anonymously via feedback.cs.washington.edu.