|
Intro |
|
|
| 9/25 Th |
Introduction, neural network basics
|
chapter 1-4 of Dive into Deep Learning, Zhang et al
https://playground.tensorflow.org/
|
Lecture 1,
Lecture 1 (annotated)
|
|
Approximation Theory |
|
|
| 9/30 Tu |
1D and multivariate approximation (Zoom)
|
Chapter 1,2 of Matus Telgarsky's notes |
Lecture 2
, Lecture 2 (annotated),
scirbed notes on 1D, multivariate approximation, and Barron's Theory.
|
| 10/2 Th |
Barron's theory (Zoom)
|
Chapter 3,5 of Matus Telgarsky's notes
|
Lecture 3
, Lecture 3 (annotated)
|
|
Optimization |
|
|
| 10/7 Tu |
Depth separation, backpropagation, auto-differentiation,
|
Chapter 4 of Dive into Deep Learning, Zhang et al , Chapter 9 of Matus Telgarsky's notes |
Lecture 4,
Lecture 4 (annotated)
|
| 10/9 Th |
Clarke differential, auto-balancing
|
Chapter 9 of Matus Telgarsky's notes,
Chapter 12 of Dive into Deep Learning, Zhang et al. ,
Du et al. on auto-balancing, Optimizer visualization
|
Lecture 5,
Lecture 5 (annotated),
scribed notes on Clarke differential, positive homogeneity and auto-balancing
|
| 10/14 Tu |
Advanced optimizers
|
Chapter 12 of Dive into Deep Learning, Zhang et al.
|
Lecture 6
, Lecture 6 (annotated)
|
| 10/16 Th |
Important techniques for improving optimization,
optimization landscape
|
He et al. on Kaiming initialization, blog of escaping saddle points, blog on how to escape saddle points efficiently
|
Lecture 7
, Lecture 7 (annotated),
scribed notes on Kaiming initialization
|
| 10/21 Tu |
Global convergence of gradient descent for over-parameterized neural networks
|
Du et al. on global convergence of gradient descent
|
Lecture 8,
Lecture 8(annotated),
scribed notes on global convergence of gradient descent
|
|
Generalization |
|
|
| 10/23 Th |
Neural tangent kernel, measures of generalzation, techniques for improving generalization, generalization theory for deep learning
|
Jacot et al. on Neural Tangent Kernel, Arora et al. on Neural Tangent Kernel,
Zhang et al. on rethinking generalization on deep learning,
|
Lecture 9
, Lecture 9 (annotated)
|
| 10/28 Tu |
Generalization theory for deep learning, separation between neural network and kernel,
|
Chapter 10 - 14 of Matus Telgarsky's notes,
Jiang et al. on different generalization measures, Belkin et al. on double descent,
Allen-Zhu and Li on separation beteween neural networks and kernels
|
Lecture 10
, Lecture 10 (annotated), scribed notes on separation between NN and kernel
|
|
Neural Network Architecture |
|
|
| 10/30 Th |
Double descent, implicit bias,
introduction to convolutional neural networks, advanced convolutional neural networks
|
Chapter 7,8 of Dive into Deep Learning, Zhang et al.
|
Lecture 11
, Lecture 11 (annotated)
|
| 11/4 Tu |
Recurrent neural networks, LSTM
|
Chapter 9, 10 of Dive into Deep Learning, Zhang et al.
|
Lecture 12
, Lecture 12 (annotated)
|
| 11/6 Th |
Attention mechanism
|
Chapter 11 of Dive into Deep Learning, Zhang et al.
|
Lecture 13
, Lecture 13 (annotated)
|
|
Representation learning and generative models |
|
|
| 11/11 Tu |
Veterans Day |
|
|
| 11/13 Th |
Desiderata for representation learning, Self-supervised learning, Contrastive learning
|
Bengio et al. on representation learning |
Lecture 14
, Lecture 14 (annotated)
|
| 11/18 Tu |
CLIP, desiderata for generative models, Variational autoencoder
|
Chapter 20 of Dive into Deep Learning, Zhang et al., CLIP paper
|
Lecture 15
, Lecture 15 (annotated)
|
| 11/20 Th |
GAN, Energy-based models
|
Chapter 20 of Dive into Deep Learning, Zhang et al.
|
Lecture 16
, Lecture 16 (annotated)
|
| 11/25 Tu |
Normalizing flows, score-based models, diffusion models
|
Yang Song's blog on score-based models, Lilian Weng's blog on diffusion models.
|
Lecture 17
, Lecture 17 (annotated)
|
| 11/27 Th |
Thanksgiving
|
|
|
|
Course Presentations |
|
|
| 12/2 Tu |
Project Presentation (on Zoom)
|
|
|
| 12/4 Th |
Project Presentation (on Zoom)
|
|
|