CSE 599K

General

Over the past decade, there have been significant advances in machine learning (ML) algorithms and models. Further aided by advances in computational power, ML algorithms have evolved to process and analyze enormous datasets efficiently. However, all of these advances have placed a considerable strain on our computing infrastructure. Training and inference of machine learning models incur significant costs and induce substantial processing delays. Understanding and optimizing the systems used for machine learning is thus crucial for unlocking its true potential. In this course, we will provide students with an in-depth understanding of the various elements of modern ML systems, ranging from the performance characteristics of ML models such as transformers, languages and compilers for machine learning, architectural support for ML computations, and distributed computing required for training and inference of large ML models. We will learn about the design rationale behind the state-of-the-art machine learning frameworks and advanced system techniques to scale models and reduce the computing, memory, and communication needs. We will focus on case studies on modern large language model (LLM) training and serving systems used in practice today.