CSE 599, Autumn 2020
Generative Models

Lecture: Monday, Wednesday 1:30-2:50

Contact: thickstn@cs.washington.edu

Office hours: Friday 3:00-4:00 (the same Zoom room as class)

Course Overview

This course explores a variety of modern techniques for learning to sample from an unknown probability distribution given examples. Generative models are an active area of research: most of the techniques we discuss in this course have been developed in the last 10 years. This course is integrated tightly with the current research literature, and will provide the context needed to read papers on the most recent developments in the field. The lectures will focus on the theoretical and mathematical foundations of generative modeling techniques. The homeworks will consist a mix of analytical and computational exercises. The course project is intended to offer an opportunity to apply these ideas to your own research, or to more deeply investigate one of the topics discussed in the course.

Prerequisites: This course builds upon fundamental concepts in machine learning, as presented in e.g. CSE 546.

List of topics:

Autoregressive Models

The NADE Framework
RNN/LSTM and Transformers

Variational Autoencoders

The Gaussian VAE
ConvNets and ResNets
Posterior Collapse
Discrete VAEs

Generative Adversarial Nets

f-GANs
Wasserstein GANs
Generative Sinkhorn Modeling

Generative Flow

Autoregressive Flows
Invertible Networks
Neural Ordinary Differential Equations

Energy-Based Models

Stein's Method and Score Matching
Langevin Dynamics and Diffusions

Course material covering similar topics from other institutions:

[Stanford CS 236]: Deep Generative Models
[Berkeley CS 294-158]: Deep Unsupervised Learning

Discussion Forum and Email Communication

Discussion will take place on Ed. For private or confidential questions email the instructor. You may also get messages to the instructor through anonymous course feedback.

Coursework

There will be 3 homeworks (each worth 20%) and a project (worth 40%).

Homework 0: (No submission)
- Familiarize yourself with Google Colab and PyTorch
- [This notebook] contains a simple example of autodiff in Pytorch.
[Homework 1], [Git Repo]: Due on October 26
- Sampling Transformations, Gaussian Mixture Models, Autoregressive Modeling (WikiText2)
[Homework 2], [Git Repo]: Due on November 16
- Variational Autoencoders, PixelCNN, Normalizing Flows (MNIST)
[Homework 3], [Git Repo]: Due on December 7
- Generative Adversarial Nets, Wasserstein GAN (CIFAR-10)
Final Project: Due on December 18th
- Partner with up to 4 people
- Examples of possible projects:
- Consider what computing resources you might need and plan ahead

Schedule

Lecture 1: Sept. 30

Welcome, logistics, overview of the course
Pushforward distributions and simulation of random variables
Discrete versus Continuous Modeling
[Lecture Notes], [Slides]
Supplementary Reading:

[Simulation of Random Variables]: a statistician's perspective on generative modeling.
[Deep Generative Models]: a broad introduction to modern generative models.

Lecture 2: Oct. 5

Parametric Modeling
Gaussian Mixture Models, Expectation Maximization
The Evidence Lower Bound (ELBO)
[Lecture Notes], [Slides]
Supplementary Reading:

[David McAllester's Notes on EM]: a thorough discussion and comparison of hard EM and soft EM.
[Justin Domke's Notes on EM]: a similar perspective to our lecture notes, with a different exposition.

Lecture 3: Oct. 7

Sequence Modeling, Text Modeling
Linear Autoregressive Models, n-gram Models
[Lecture Notes], [Slides]
Supplementary Reading:

[A Course in Time Series Analysis]: a statistical treatment of time series.
[Jurafsky and Martin (Chapter 3)]: the standard exposition of n-gram models.

Lecture 4: Oct. 12

Fully-Visible Sigmoid Belief Networks (FVSBN)
Neural Autoregressive Distribution Estimation (NADE)
Recurrent Neural Networks (RNN)
Exposure Bias
[Lecture Notes], [Slides]
Supplementary Reading:

[The Recurrent Neural Networks cheatsheet]: Stanford CS 230 discussion of RNN's.
[The unreasonable effectiveness of Character-level Language Models]: a sober comparison of n-gram and NADE language models.

Lecture 5: Oct. 14

Transformers and LayerNorm
Long Short-Term Memory (LSTM)
[Lecture Notes], [Slides]
Supplementary Reading:

[The Annotated Transformer]: annotated notes on the original transformer paper, with code.
[Understanding LSTM Networks]: a popular exposition of LSTM networks with intuitive motivation and visualizations.

Lecture 6: Oct. 19

The Variational Autoencoder (VAE)
Monte-Carlo Gradient Estimation
The ELBO for Gaussian VAEs
[Lecture Notes], [Slides]
Supplementary Reading:

[The Evidence Lower Bound]: an excellent informal discussion of the ELBO.
[Monte Carlo Integration]: a succinct introduction to Monte Carlo and importance sampling.
[Monte Carlo Gradient Estimation]: a thorough reference on Monte Carlo gradient estimators.

Lecture 7: Oct. 21

Image Modeling
Convolutional Neural Networks (CNNs)
Residual Networks and BatchNorm
[Lecture Notes], [Slides]
Supplementary Reading:

[A Guide to Convolutional Arithmetic]: a careful treatment of discrete convolutions, with illustrations.
[Convolutional Neural Networks]: Stanford's CS 231n introduction to convnets.

Lecture 8: Oct. 26

Importance-Weighted Autoencoders (IWAE)
PixelCNN, PixelVAE, and Posterior Collapse
Normalizing Flows
[Lecture Notes], [Slides]
Supplementary Reading:

[Normalizing Flows]: an extended informal discussion of normalizing flows.
[Posterior Collapse]: an interesting mathematical discussion of posterior collapse.

Lecture 9: Oct. 28

Inverse Autoregressive Flows (IAF)
Discrete VAE's and the Vector-Quantized VAE's (VQ-VAE)
Discrete Gradient Estimators: REINFORCE, Gumbel, ST
[Lecture Notes], [Slides]
Supplementary Reading:

[NVAE]: High-resolution image modeling with VAE+IAF.
[Jukebox]: a VQ-VAE for generative modeling of audio.

Lecture 10: Nov. 2

Generative Adversarial Networks (GAN)
f-Divergences and the f-GAN
The Goodfellow GAN
[Lecture Notes], [Slides]

[DC-GAN]: A Pytorch implementation of the popular DC-GAN Architecture.
[Checkerboard Artifacts]: Some practical observations about GAN architectures.

Lecture 11: Nov. 4

The Wasserstein GAN
Gradient Penalty Methods
[Slides]
Supplementary Reading:

[Wasserstein GAN]: The original Wasserstein GAN paper.
[Gradient Penalty]: Enforcing the Lipschitz constraint with gradient penalties.

Lecture 12: Nov. 9

Kantorovich-Rubinstein Duality
[Lecture Notes], [Slides]

Lecture 13: Nov. 16

GAN Evaluation: Inception Score and FID
Optimal transport
[Lecture Notes], [Slides]
Supplementary Reading:

[Computational Optimal Transport]: An introductory textbook on optimal transport.
[Inception Scores]: A thoughtful discussion of evaluation using Inception scores.

Lecture 14: Nov. 18

Sinkhorn's algorithm
Generative Sinkhorn Modeling
[Slides]
Supplementary Reading:

[Matrix Scaling]: The Sinkhorn matrix-scaling problem in a very different context.
[Generative Sinkhorn Modeling]: Back-prop through Sinkhorn's algorithm for generative modeling.

Lecture 15: Nov. 23

Generative Flow: NICE, RealNVP, Glow
Neural ODE's, Ffjord
[Lecture Notes], [Slides]
Supplementary Reading:

[Feistel Networks]: A cryptographic analog to invertible neural networks.

Lecture 16: Nov. 30

Energy-based Models
Langevin Dynamics
Implicit Score Matching
[Lecture Notes], [Slides]
Supplementary Reading:

[How to Train Your Energy-Based Models]: A recent survey of energy-based models.
[Theory of Optimization and Sampling]: A course on sampling with excellent notes on Langevin dynamics.
[Sampling Methods]: A broad overview of various sampling techniques.

Lecture 17: Dec. 2

Sliced Score Matching
Denoising Autoencoders
[Lecture Notes], [Slides]
Supplementary Reading:

[Sliced Score Matching: A Scalable Approach to Density and Score Estimation]: Yang Song's paper on sliced score matching.

Lecture 18: Dec. 7

Simulated Annealing
Denoising Diffusion Probabilistic Models
[Slides]
Supplementary Reading:

[Denoising Diffusion Probabilistic Models]: Jonathan Ho's paper introducing Denoising Diffusion models.
[Score-Based Generative Modeling through Stochastic Differential Equations]: An SDE formulation of score-based generative models.

CSE 599, Autumn 2020 Generative Models

Course Overview

Discussion Forum and Email Communication

Coursework

Schedule

CSE 599, Autumn 2020
Generative Models