Tentative Schedule

Date	Content	Reading	Slides and Notes
	Intro
9/28 Th	Introduction, neural network basics	chapter 1-4 of Dive into Deep Learning, Zhang et al https://playground.tensorflow.org/	Lecture 1, Lecture 1 (annotated)
	Approximation Theory
10/3 Tu	1D and multivariate approximation (on Zoom)	Chapter 1,2 of Matus Telgarsky's notes	Lecture 2 , Lecture 2 (annotated), scirbed notes on 1D, multivariate approximation, and Barron's Theory.
10/5 Th	Barron's theory, depth separation (on Zoom)	Chapter 3,5 of Matus Telgarsky's notes	Lecture 3 , Lecture 3 (annotated)
	Optimization
10/10 Tu	Backpropagation, auto-differentiation, Clarke differential	Chapter 4 of Dive into Deep Learning, Zhang et al , Chapter 9 of Matus Telgarsky's notes	Lecture 4, Lecture 4 (annotated)
10/12 Th	Auto-balancing, advanced optimizers	Chapter 9 of Matus Telgarsky's notes, Chapter 12 of Dive into Deep Learning, Zhang et al. , Du et al. on auto-balancing, Optimizer visualization	Lecture 5, Lecture 5 (annotated), scribed notes on Clarke differential, positive homogeneity and auto-balancing
10/17 Tu	Advanced optimizers，initialization techniques for improving optimization	Chapter 12 of Dive into Deep Learning, Zhang et al. , He et al. on Kaiming initialization	Lecture 6 , Lecture 6 (annotated), scribed notes on Kaiming initialization
10/19 Th	Normalization techniques for improving optimization, optimization landscape, global convergence of gradient descent	blog of escaping saddle points, blog on how to escape saddle points efficiently, Du et al. on global convergence of gradient descent	Lecture 7 , Lecture 7 (annotated), scribed notes on global convergence of gradient descent
10/24 Tu	Finish the proof of global convergence of gradient descent	Du et al. on global convergence of gradient descent	Lecture 8, Lecture 8(annotated)
	Generalization
10/26 Th	Neural tangent kernel, measures of generalzation, techniques for improving generalization,	Jacot et al. on Neural Tangent Kernel, Arora et al. on Neural Tangent Kernel, Zhang et al. on rethinking generalization on deep learning,	Lecture 9 , Lecture 9 (annotated)
10/31 Tu	Generalization theory for deep learning, separation between neural network and kernel	Chapter 10 - 14 of Matus Telgarsky's notes, Jiang et al. on different generalization measures, Belkin et al. on double descent, Allen-Zhu and Li on separation beteween neural networks and kernels	Lecture 10 , Lecture 10 (annotated), scribed notes on separation between NN and kernel
	Neural Network Architecture
11/2 Th	Double descent, implicit bias, introduction to convolutional neural networks, advanced convolutional neural networks	Chapter 7,8 of Dive into Deep Learning, Zhang et al.	Lecture 11, Lecture 11 (annotated)
11/7 Tu	Recurrent neural networks, LSTM	Chapter 9, 10 of Dive into Deep Learning, Zhang et al.	Lecture 12 , Lecture 12 (annotated)
11/9 Th	Attention mechanism, desiderata for representation learning	Chapter 11 of Dive into Deep Learning, Zhang et al., Bengio et al. on representation learning	Lecture 13 , Lecture 13 (annotated)
	Representation learning, Pre-training, Fine-tuning
11/14 Tu	Self-supervised learning, contrastive learning	Chapter 11 of Dive into Deep Learning, Zhang et al.	Lecture 14 , Lecture 14 (annotated)
11/16 Th	Deep reinforcement learning, decision transformer (guest lecture by Qiwen Cui, Xinqi Wang, Vector Zhou, on Zoom)		Lecture 15
	Generative models
11/21 Tu	Desiderata for generative models, GAN	Chapter 20 of Dive into Deep Learning, Zhang et al.	Lecture 16, Lecture 16 (annotated)
11/23 Th	Thanksgving
11/28 Tu	Variational autoencoder, energy models	Chapter 20 of Dive into Deep Learning, Zhang et al.	Lecture 17, Lecture 17 (annotated)
11/30 Th	Normalizing flows, score-based models, diffusion models	Yang Song's blog on score-based models, Lilian Weng's blog on diffusion models.	Lecture 18, Lecture 18 (annotated)
	Course Presentations
12/5 Tu	Project Presentation (on Zoom)
12/7 Th	Project Presentation (on Zoom)