|
Intro |
|
|
9/26 Th |
Introduction, neural network basics (Zoom Recording)
|
chapter 1-4 of Dive into Deep Learning, Zhang et al
https://playground.tensorflow.org/
|
Lecture 1,
Lecture 1 (annotated)
|
|
Approximation Theory |
|
|
10/1 Tu |
1D and multivariate approximation
|
Chapter 1,2 of Matus Telgarsky's notes |
Lecture 2
, Lecture 2 (annotated),
scirbed notes on 1D, multivariate approximation, and Barron's Theory.
|
10/3 Th |
Barron's theory
|
Chapter 3,5 of Matus Telgarsky's notes
|
Lecture 3
, Lecture 3 (annotated)
|
|
Optimization |
|
|
10/8 Tu |
Depth separation, backpropagation, auto-differentiation,
|
Chapter 4 of Dive into Deep Learning, Zhang et al , Chapter 9 of Matus Telgarsky's notes |
Lecture 4,
Lecture 4 (annotated)
|
10/10 Th |
Clarke differential, auto-balancing
|
Chapter 9 of Matus Telgarsky's notes,
Chapter 12 of Dive into Deep Learning, Zhang et al. ,
Du et al. on auto-balancing, Optimizer visualization
|
Lecture 5,
Lecture 5 (annotated),
scribed notes on Clarke differential, positive homogeneity and auto-balancing
|
10/15 Tu |
Advanced optimizers,
|
Chapter 12 of Dive into Deep Learning, Zhang et al.
|
Lecture 6, Lecture 6 (annotated)
|
10/17 Th |
Important techniques for improving optimization,
optimization landscape
|
He et al. on Kaiming initialization, blog of escaping saddle points, blog on how to escape saddle points efficiently
|
Lecture 7
, Lecture 7 (annotated),
scribed notes on Kaiming initialization
|
10/22 Tu |
Global convergence of gradient descent for over-parameterized neural networks
|
Du et al. on global convergence of gradient descent
|
Lecture 8,
Lecture 8(annotated),
scribed notes on global convergence of gradient descent
|
|
Generalization |
|
|
10/24 Th |
Neural tangent kernel, measures of generalzation, techniques for improving generalization, generalization theory for deep learning
|
Jacot et al. on Neural Tangent Kernel, Arora et al. on Neural Tangent Kernel,
Zhang et al. on rethinking generalization on deep learning,
|
Lecture 9
, Lecture 9 (annotated)
|
10/29 Tu |
Generalization theory for deep learning, separation between neural network and kernel, double descent, implicit bias
|
Chapter 10 - 14 of Matus Telgarsky's notes,
Jiang et al. on different generalization measures, Belkin et al. on double descent,
Allen-Zhu and Li on separation beteween neural networks and kernels
|
Lecture 10
, Lecture 10 (annotated), scribed notes on separation between NN and kernel
|
|
Neural Network Architecture |
|
|
10/31 Th |
Introduction to convolutional neural networks, advanced convolutional neural networks
|
Chapter 7,8 of Dive into Deep Learning, Zhang et al.
|
Lecture 11
, Lecture 11 (annotated)
|
11/5 Tu |
Recurrent neural networks, LSTM
|
Chapter 9, 10 of Dive into Deep Learning, Zhang et al.
|
Lecture 12
, Lecture 12 (annotated)
|
11/7 Th |
Attention mechanism
|
Chapter 11 of Dive into Deep Learning, Zhang et al.
|
Lecture 13
, Lecture 13 (annotated)
|
|
Representation learning and generative models |
|
|
11/12 Tu |
Desiderata for representation learning, Self-supervised learning,
|
Bengio et al. on representation learning |
Lecture 14
, Lecture 14 (annotated)
|
11/14 Tu |
Contrastive learning, CLIP, desiderata for generative models
|
Chapter 20 of Dive into Deep Learning, Zhang et al., CLIP paper
|
Lecture 15
, Lecture 15 (annotated)
|
11/19 Tu |
GAN, Variational autoencoder,
|
Chapter 20 of Dive into Deep Learning, Zhang et al.
|
Lecture 16
, Lecture 16 (annotated)
|
11/21 Th |
Energy-based models, normalizing flows, score-based models, diffusion models
|
Yang Song's blog on score-based models, Lilian Weng's blog on diffusion models.
|
Lecture 17
, Lecture 17 (annotated)
|
11/26 Th |
Deep reinforcement learning, decision transformer
|
Chapter 17 of Dive into Deep Learning, Zhang et al.
|
Lecture 18, Lecture 18 (annotated)
|
11/28 Th |
Thanksgiving
|
|
|
|
Course Presentations |
|
|
12/3 Tu |
Project Presentation (on Zoom)
|
|
|
12/5 Th |
Project Presentation (on Zoom)
|
|
|