Date | Content | Reading | Slides and Notes |
---|---|---|---|
Intro | |||
9/26 Th | Introduction, neural network basics (Zoom Recording) |
chapter 1-4 of Dive into Deep Learning, Zhang et al
https://playground.tensorflow.org/ |
Lecture 1, Lecture 1 (annotated) |
Approximation Theory | |||
10/1 Tu | 1D and multivariate approximation | Chapter 1,2 of Matus Telgarsky's notes | Lecture 2 , Lecture 2 (annotated), scirbed notes on 1D, multivariate approximation, and Barron's Theory. |
10/3 Th | Barron's theory | Chapter 3,5 of Matus Telgarsky's notes | Lecture 3 , Lecture 3 (annotated) |
Optimization | |||
10/8 Tu | Depth separation, backpropagation, auto-differentiation, | Chapter 4 of Dive into Deep Learning, Zhang et al , Chapter 9 of Matus Telgarsky's notes | Lecture 4, Lecture 4 (annotated) |
10/10 Th | Clarke differential, auto-balancing | Chapter 9 of Matus Telgarsky's notes, Chapter 12 of Dive into Deep Learning, Zhang et al. , Du et al. on auto-balancing, Optimizer visualization | Lecture 5, Lecture 5 (annotated), scribed notes on Clarke differential, positive homogeneity and auto-balancing |
10/15 Tu | Advanced optimizers, | Chapter 12 of Dive into Deep Learning, Zhang et al. | Lecture 6, Lecture 6 (annotated) |
10/17 Th | Important techniques for improving optimization, optimization landscape | He et al. on Kaiming initialization, blog of escaping saddle points, blog on how to escape saddle points efficiently | Lecture 7 |
10/22 Tu | Global convergence of gradient descent for over-parameterized neural networks | Du et al. on global convergence of gradient descent | |
Generalization | |||
10/24 Th | Neural tangent kernel, measures of generalzation, techniques for improving generalization, | Jacot et al. on Neural Tangent Kernel, Arora et al. on Neural Tangent Kernel, Zhang et al. on rethinking generalization on deep learning, | |
10/29 Tu | Generalization theory for deep learning, separation between neural network and kernel | Chapter 10 - 14 of Matus Telgarsky's notes, Jiang et al. on different generalization measures, Belkin et al. on double descent, Allen-Zhu and Li on separation beteween neural networks and kernels | |
Neural Network Architecture | |||
10/31 Th | Double descent, implicit bias, introduction to convolutional neural networks, advanced convolutional neural networks | Chapter 7,8 of Dive into Deep Learning, Zhang et al. | |
11/5 Tu | Recurrent neural networks, LSTM | Chapter 9, 10 of Dive into Deep Learning, Zhang et al. | |
11/7 Th | Attention mechanism, desiderata for representation learning | Chapter 11 of Dive into Deep Learning, Zhang et al., Bengio et al. on representation learning | |
Representation learning, Pre-training, Fine-tuning | |||
11/12 Tu | Self-supervised learning, contrastive learning | Chapter 11 of Dive into Deep Learning, Zhang et al. | |
11/14 Th | Deep reinforcement learning, decision transformer | ||
Generative models | |||
11/19 Tu | Desiderata for generative models, GAN | Chapter 20 of Dive into Deep Learning, Zhang et al. | |
11/21 Tu | Variational autoencoder, energy models | Chapter 20 of Dive into Deep Learning, Zhang et al. | |
11/26 Th | Normalizing flows, score-based models, diffusion models | Yang Song's blog on score-based models, Lilian Weng's blog on diffusion models. | |
11/28 Th | Thanksgving | ||
Course Presentations | |||
12/3 Tu | Project Presentation (on Zoom) | ||
12/5 Th | Project Presentation (on Zoom) |