This course explores a variety of modern techniques for learning to sample from an unknown probability distribution given examples. Generative models are an active area of research: most of the techniques we discuss in this course have been developed in the last 10 years. This course is integrated tightly with the current research literature, and will provide the context needed to read papers on the most recent developments in the field. The lectures will focus on the theoretical and mathematical foundations of generative modeling techniques. The homeworks will consist a mix of analytical and computational exercises. The course project is intended to offer an opportunity to apply these ideas to your own research, or to more deeply investigate one of the topics discussed in the course.

**Prerequisites**: This course builds upon fundamental concepts in machine learning, as presented in e.g. CSE 546.

**List of topics**:

- Autoregressive Models
- The NADE Framework
- RNN/LSTM and Transformers
- Variational Autoencoders
- The Gaussian VAE
- ConvNets and ResNets
- Posterior Collapse
- Discrete VAEs
- Generative Adversarial Nets
- f-GANs
- Wasserstein GANs
- Generative Sinkhorn Modeling
- Generative Flow
- Autoregressive Flows
- Invertible Networks
- Neural Ordinary Differential Equations
- Energy-Based Models
- Stein's Method and Score Matching
- Langevin Dynamics and Diffusions

**Course material covering similar topics from other institutions**:

- [Stanford CS 236]: Deep Generative Models
- [Berkeley CS 294-158]: Deep Unsupervised Learning

Discussion will take place on Ed. For private or confidential questions email the instructor. You may also get messages to the instructor through anonymous course feedback.

There will be 3 homeworks (each worth 20%) and a project (worth 40%).

- Homework 0: (No submission)
- Familiarize yourself with Google Colab and PyTorch
- [This notebook] contains a simple example of autodiff in Pytorch.

- [Homework 1], [Git Repo]: Due on October 26
- Sampling Transformations, Gaussian Mixture Models, Autoregressive Modeling (WikiText2)

- [Homework 2], [Git Repo]: Due on November 16
- Variational Autoencoders, PixelCNN, Normalizing Flows (MNIST)

- [Homework 3], [Git Repo]: Due on December 7
- Generative Adversarial Nets, Wasserstein GAN (CIFAR-10)

- Final Project: Due on December 18th
- Partner with up to 4 people
- Examples of possible projects:
- An application of generative models to your own research
- Reproduction of empirical results reported in a recent paper
- Exposition or extension of a technical theoretical result in a recent paper
- Application of generative modeling techniques to a novel dataset
- Consider what computing resources you might need and plan ahead

- Lecture 1: Sept. 30
- Welcome, logistics, overview of the course
- Pushforward distributions and simulation of random variables
- Discrete versus Continuous Modeling
- [Lecture Notes], [Slides]
- Supplementary Reading:
- [Simulation of Random Variables]: a statistician's perspective on generative modeling.
- [Deep Generative Models]: a broad introduction to modern generative models.
- Lecture 2: Oct. 5
- Parametric Modeling
- Gaussian Mixture Models, Expectation Maximization
- The Evidence Lower Bound (ELBO)
- [Lecture Notes], [Slides]
- Supplementary Reading:
- [David McAllester's Notes on EM]: a thorough discussion and comparison of hard EM and soft EM.
- [Justin Domke's Notes on EM]: a similar perspective to our lecture notes, with a different exposition.
- Lecture 3: Oct. 7
- Sequence Modeling, Text Modeling
- Linear Autoregressive Models, n-gram Models
- [Lecture Notes], [Slides]
- Supplementary Reading:
- [A Course in Time Series Analysis]: a statistical treatment of time series.
- [Jurafsky and Martin (Chapter 3)]: the standard exposition of n-gram models.
- Lecture 4: Oct. 12
- Fully-Visible Sigmoid Belief Networks (FVSBN)
- Neural Autoregressive Distribution Estimation (NADE)
- Recurrent Neural Networks (RNN)
- Exposure Bias
- [Lecture Notes], [Slides]
- Supplementary Reading:
- [The Recurrent Neural Networks cheatsheet]: Stanford CS 230 discussion of RNN's.
- [The unreasonable effectiveness of Character-level Language Models]: a sober comparison of n-gram and NADE language models.
- Lecture 5: Oct. 14
- Transformers and LayerNorm
- Long Short-Term Memory (LSTM)
- [Lecture Notes], [Slides]
- Supplementary Reading:
- [The Annotated Transformer]: annotated notes on the original transformer paper, with code.
- [Understanding LSTM Networks]: a popular exposition of LSTM networks with intuitive motivation and visualizations.
- Lecture 6: Oct. 19
- The Variational Autoencoder (VAE)
- Monte-Carlo Gradient Estimation
- The ELBO for Gaussian VAEs
- [Lecture Notes], [Slides]
- Supplementary Reading:
- [The Evidence Lower Bound]: an excellent informal discussion of the ELBO.
- [Monte Carlo Integration]: a succinct introduction to Monte Carlo and importance sampling.
- [Monte Carlo Gradient Estimation]: a thorough reference on Monte Carlo gradient estimators.
- Lecture 7: Oct. 21
- Image Modeling
- Convolutional Neural Networks (CNNs)
- Residual Networks and BatchNorm
- [Lecture Notes], [Slides]
- Supplementary Reading:
- [A Guide to Convolutional Arithmetic]: a careful treatment of discrete convolutions, with illustrations.
- [Convolutional Neural Networks]: Stanford's CS 231n introduction to convnets.
- Lecture 8: Oct. 26
- Importance-Weighted Autoencoders (IWAE)
- PixelCNN, PixelVAE, and Posterior Collapse
- Normalizing Flows
- [Lecture Notes], [Slides]
- Supplementary Reading:
- [Normalizing Flows]: an extended informal discussion of normalizing flows.
- [Posterior Collapse]: an interesting mathematical discussion of posterior collapse.
- Lecture 9: Oct. 28
- Inverse Autoregressive Flows (IAF)
- Discrete VAE's and the Vector-Quantized VAE's (VQ-VAE)
- Discrete Gradient Estimators: REINFORCE, Gumbel, ST
- [Lecture Notes], [Slides]
- Supplementary Reading:
- Lecture 10: Nov. 2
- Generative Adversarial Networks (GAN)
- f-Divergences and the f-GAN
- The Goodfellow GAN
- [Lecture Notes], [Slides]
- [DC-GAN]: A Pytorch implementation of the popular DC-GAN Architecture.
- [Checkerboard Artifacts]: Some practical observations about GAN architectures.
- Lecture 11: Nov. 4
- The Wasserstein GAN
- Gradient Penalty Methods
- [Slides]
- Supplementary Reading:
- [Wasserstein GAN]: The original Wasserstein GAN paper.
- [Gradient Penalty]: Enforcing the Lipschitz constraint with gradient penalties.
- Lecture 12: Nov. 9
- Kantorovich-Rubinstein Duality
- [Lecture Notes], [Slides]
- Lecture 13: Nov. 16
- GAN Evaluation: Inception Score and FID
- Optimal transport
- [Lecture Notes], [Slides]
- Supplementary Reading:
- [Computational Optimal Transport]: An introductory textbook on optimal transport.
- [Inception Scores]: A thoughtful discussion of evaluation using Inception scores.
- Lecture 14: Nov. 18
- Sinkhorn's algorithm
- Generative Sinkhorn Modeling
- [Slides]
- Supplementary Reading:
- [Matrix Scaling]: The Sinkhorn matrix-scaling problem in a very different context.
- [Generative Sinkhorn Modeling]: Back-prop through Sinkhorn's algorithm for generative modeling.
- Lecture 15: Nov. 23
- Generative Flow: NICE, RealNVP, Glow
- Neural ODE's, Ffjord
- [Lecture Notes], [Slides]
- Supplementary Reading:
- [Feistel Networks]: A cryptographic analog to invertible neural networks.
- Lecture 16: Nov. 30
- Energy-based Models
- Langevin Dynamics
- Implicit Score Matching
- [Lecture Notes], [Slides]
- Supplementary Reading:
- [How to Train Your Energy-Based Models]: A recent survey of energy-based models.
- [Theory of Optimization and Sampling]: A course on sampling with excellent notes on Langevin dynamics.
- [Sampling Methods]: A broad overview of various sampling techniques.
- Lecture 17: Dec. 2
- Sliced Score Matching
- Denoising Autoencoders
- [Lecture Notes], [Slides]
- Supplementary Reading:
- [Sliced Score Matching: A Scalable Approach to Density and Score Estimation]: Yang Song's paper on sliced score matching.
- Lecture 18: Dec. 7
- Simulated Annealing
- Denoising Diffusion Probabilistic Models
- [Slides]
- Supplementary Reading:
- [Denoising Diffusion Probabilistic Models]: Jonathan Ho's paper introducing Denoising Diffusion models.
- [Score-Based Generative Modeling through Stochastic Differential Equations]: An SDE formulation of score-based generative models.