CSE 446, Winter 2019

Machine Learning

Quick links:
Lectures Homework Office Hours Section Materials

Instructor: Sham Kakade

TAs: Kousuke Ariga, Benjamin Evans, Shobhit Hathi, Alina Liokumovich, Mathew (Xi) Liu, Thomas Merth, Patrick Spieker, Yuchong (Yvonna) Xiang.

Class lectures: MWF 9:30-10:20am, Room: SIG 134

Contact: cse446-staff@cs.washington.edu

Please communicate to the instructor and TAs only through this account.

Discussion: Piazza discussion board

Please send all questions about homeworks, lectures, and policies to the Piazza discussion board. If you have a question of personal matters, please email the instructors list: cse446-staff@cs.washington.edu.

Announcements:

Please make sure you monitor for (and receive) announcements from both the official UW class mailing list and from Piazza. Piazza is a convenient to send out some announcements, such as homework corrections and clarifications. It is important for you to make sure you get these announcements in a timely manner.

Office Hours:

Please double check the website before you arrive for location changes/cancellations.

Kousuke Ariga: Thursday 4:30-5:30, CSE 286 or 2nd floor breakout

Benjamin Evans: Tuesday 3:30-4:30, CSE 007

Shobhit Hathi: Tuesday ~~10:30-11:30, CSE 674~~

Sham Kakade: Monday 3:30-4:45, Gates 303

Alina Liokumovich: ~~Monday 11:00-12:00, Gates 151~~

Mathew (Xi) Liu: Wednesday 10:30-11:30, CSE 007

Thomas Merth: Wednesday 4:00-5:00 CSE 021

Patrick Spieker: Friday 11:30-12:30, Gates 150

Yuchong (Yvonna) Xiang: Thursday 3:30-4:30, Gates 151

About the Course and Prerequisites

Machine learning explores the study and construction of algorithms that learn from data in order to make inferences about future outcomes. This study is a marriage of algorithms, computation, and statistics, and the class will focus on concepts from all three areas. The study of learning from data is playing an increasingly important role in numerous areas of science and technology, and the goal of this course are to provide a thorough grounding in the fundamental methodologies and algorithms of machine learning.

Prerequisites: Students entering the class should be comfortable with programming and should have a pre-existing working knowledge of probability and statistics (MATH 394, STAT390, STAT 391, or CSE 312), linear algebra (MATH 308), vector calculus (MATH 324), and algorithms. Students who are weak in these areas should either: take the course at a later date, when better prepared, or expect to put in substantially more effort to catch up. For refreshers, there are statistics/probability and linear algebra reference materials below.

Textbooks and reference materials

The required reading assignments will be from the following two books:

A Course in Machine Learning, Hal Daume.
Machine Learning: A Probabilistic Perspective, Kevin Murphy. A more Bayesian approach to machine learning.

Some of the following reference materials may be helpful throughout the quarter and some are more advanced, which you may find useful downstream to deepen your understanding of the topics.

Machine Learning

Understanding Machine Learning: From Theory to Algorithms, Shai Shalev-Shwartz, Shai Ben-David. A gentle introduction to theoretical machine learning. I would recommend this book if you are seeking a deeper understanding of ML.
Pattern Recognition and Machine Learning, Chris Bishop. A little older and very good (for linear models, EM, neural nets, among other things).
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Trevor Hastie, Robert Tibshirani, Jerome Friedman. More statistical.
Computer Age Statistical Inference: Algorithms, Evidence and Data Science, Bradley Efron, Trevor Hastie. Similar to previous book.

Probability and Statistics

A First Course in Probability, Sheldon Ross. Elementary concepts (previous editions are inexpensive on Amazon)
Pattern Recognition and Machine Learning (linked to above). Section 1.2
Iain Murray's crib-sheet
All of Statistics, Larry Wasserman. Chapters 1-5 are a great probability refresher and the book is a good reference for statistics.

Linear Algebra and Matrix Analysis

Linear Algebra Review and Reference by Zico Kolter and Chuong Do (free). Light refresher for linear algebra and matrix calculus if you're a bit rusty.
Linear Algebra, David Cherney, Tom Denton, Rohit Thomas and Andrew Waldron (free). Introductory linear algebra text.
Matrix Analysis Horn and Johnson. A great reference from elementary to advanced material.

Optimization

Numerical Optimization, Nocedal, Wright (SpringLink: free on UW network). Practical algorithms and advice for general optimization problems. Relatively advanced.
Convex Optimization: Algorithms and Complexity, Sébastien Bubeck. Elegant proofs for the most popular optimization procedures used in machine learning.

Python

www.learnpython.org "Whether you are an experienced programmer or not, this website is intended for everyone who wishes to learn the Python programming language."
NumPy for Matlab users

Latex

Learn Latex in 30 minutes
Overleaf. An online Latex editor.
Standalone Latex editor on your local machine
Latex Math symbols
Detexify LaTeX handwritten symbol recognition

Grading and Policies

Grades will be based on five homework assignments (40%), a midterm (20%), and a final (40%). The cumulative homework score will be the MAXIMUM of the following two weighting schemes: scheme 1 weights the five assignments according to 10%, 22.5%, 22.5%, 22.5%, 22.5%, respectively; scheme 2 weights the five assignments according to 0%, 25%, 25%, 25%, 25%, respectively. (There may be minor reweighting of HW1-HW4 based on the difficulty of the HWs). The first homework assignment carries less weight due to it serving as a refresher of the prerequisite background knowledge in probability, statistics, and linear algebra. It is mandatory that you turn in the first homework.

NEW: we will also consider another weighting scheme of assignments (60%), a midterm (15%), and a final (25%), and we will take the max of these two schemes when determining the final grade.

The course will have a substantial amount of extra credit, which will at most impact a student score by half a letter grade, e.g. B+ to A-. The extra credits are given to encourage a mastery of the subject matter. In a small number of cases, grades may be adjusted after this breakdown for the following reasons: active participation in the discussion boards (both for asking questions and helping others out); particularly remarkable exam scores; consistently remarkable homeworks or remarkable extra credit solutions; grades will (significantly) drop based on failure to submit all the HWs.

Exams:

The midterm will be announced during week 1 of class and will be posted below in the Lecture Notes section. The final exam will be at the university scheduled time, on Weds, March 20th from 8:30am-10:20am. If you are unable to make the exam dates (and do not have an exception based on UW policies), then do not enroll in the course. Exams will not be given on alternative dates.

Homeworks:

Homework must be done individually: each student must hand in their own answers. In addition, each student must submit their own code in the programming part of the assignment (we may run your code). It is acceptable for students to discuss problems with each other; it is not acceptable for students to look at another students written answers. It is acceptable for students to discuss coding questions with others; it is not acceptable for students to look at another students code. You must also indicate on each homework with whom you collaborated with.

We expect the students not to copy, refer to, or seek out solutions in published material on the web or from other textbooks (or solutions from previous years or other courses) when preparing their answers. Students are certainly encouraged to read extra material for a deeper understanding. If you happen to find an assignment's answer, it must be acknowledged clearly with an appropriate citation on the submitted solution.

HW LATE POLICY: Homeworks must be submitted by the posted due date. You are allowed up to 4 total LATE DAYs for the homeworks throughout the entire quarter and up to 2 late days PER HOMEWORK assignment; these will be automatically deducted if your assignment is late. For example, any day in which an assignment is late by up to 24 hours, then one late day will be used (up to two late days). After your late days are used up, late penalties will be applied: any assignment turned in late will incur a reduction in score by 33% for each late day, so if an assignment is up to 24 hours late, it incurs a penalty of 33%. Else if it is up to 48 hours late, it incurs a penalty of 66%. And any longer, it will receive no credit. We will track all your late days and any deductions will be applied in computing the final grades. If you are unable to turn in HWs on time, aside from permitted days, then do not enroll in the course.

Re-grading policy:

All re-grading requests (for the homework and the midterm) must be submitted on Gradescope within seven days after the grades are released. For example, if we return the grades on Monday, then you have until midnight the following Monday to submit any re-grade requests. If you feel that we have made an error in grading your homework, please let us know with a written explanation. This policy is to ensure that we can address any concerns in a timely and fair manner. The focus of office hours and in person discussions are solely limited to asking knowledge related questions, and grade related questions must be submitted to the instructor mailing list.

Honor Code

The instructor expects (and believes) that each student will conduct himself or herself with academic (and personal) integrity. While the TAs will follow the course and university policies with regards to grading and proctoring (see CSE conduct policy), it is ultimately up to you to conduct yourself with academic and personal integrity for a number of reasons that go beyond the scope of just this class.

Diversity and Gender in STEM

While many academic disciplines have historically been dominated by one cross section of society, the study of and participation in STEM disciplines is a joy that the instructor hopes that everyone can pursue, regardless of their socio-economic background, race, gender, etc. The instructor encourages students to both be mindful of these issues, and, in good faith, try to take steps to fix them. You are the next generation here.

Readings

The required readings are for your benefit and they encompass material that you are required to understand. Furthermore, the readings will help in being able to better understand the machine learning terminology being used, both in the homeworks and exams. Sometimes there are alternative, equivalent ways to refer to the same object (a classifier, a hypothesis, a target function, etc.), and the required readings also ensure you will be better versed in machine learning terminology. The extra reading is provided to give you additional background.

Homework

Homeworks should be submitted via Gradescope. There will be a log of any changes to the HW (e.g. typos fixed, clarifications, etc) in the discussion board.

Read the website
- Due Fri, Jan 11th.
- policies

Homework 0
- Due Tue, Jan 15th.
- hw0

Homework 1
- Due Thurs, Jan 24th.
- hw1
- Template: code
- Data: digit labels.txt

Homework 2 + Extra Credit
- Due Thurs, Feb 7th.
- "milestone": (preliminary, no credit ) answer to Q2 due Weds, Jan 30th.
  - for you to check your grasp of python, numpy, and PCA.
- hw2
- extra credit: due Thurs Feb 7th.
- Data: mnist.pkl.gz mnist_2_vs_9.gz
- See MNIST for original data source.

Homework 3 + EC
- HW3 due Thurs, Feb 28th. EC due, Sun, Mar 3.
- "milestone": (preliminary, no credit ) answer to Q3 due Thurs, Feb 21. For you to check your progress on logistic regression.
- hw3
- Data for EC: mnist_all_50pca_dims.gz

Homework 4 + EC
- HW4+EC due Thurs, Mar 14th.
- "milestone": (preliminary, no credit ) answer to Q3 due Weds, Mar 6th. For you to check that you can install/use PyTorch on a "warmup" problem.
- hw4

Section Materials

Week 1 - Section 1: Python; Probability review
- Basics and Packages (numpy, pandas, matplotlib): [slides]
- Probability Review: [slides]

Week 2 - Section 2: Linear algebra review I, Perceptron Review
- Linear algebra basics in Jupyter Notebook: [html]
- Assigned Perceptron Reading: [CIML: Ch. 4]

Week 3 - Section 3: Principal Component Analysis overview
- PCA usage & proof: [slides]

Week 4 - Section 4: Linear Regression, Loss Functions, and more NumPy!
- a good set of numpy exercises: [link]
- simple NumPy worksheet: [worksheet]

Week 5 - Section 5: Midterm Review

Week 6 - No Section (Midterm Week)

Week 7 - Section 7: Post-Midterm Review

Week 8 - Section 8: Intro to Pytorch!
- Pytorch Intro Jupyter notebook: [notebook]
- Pytorch Intro Jupyter notebook SGD Example: [notebook]
- Binary Logistic Regression Example: [python script]
- Binary Linear Regression Example: [python script]
- Pytorch Official Tutorials (highly recommended): [link]

Week 9 - Section 9: Auto-diff, Cross Entropy, Softmax, and more...
- Pytorch Auto-diff example: [python script]
- Pytorch Softmax Example: [python script]

Week 10 - Section 10: Final Review
- Pre-Final Practice Questions: [pdf]

Lecture Notes and Readings

Week 1: [Jan 7] Introduction.
- Logistics; What is Machine Learning?; The supervised learning problem
- Lecture notes: [pdf]
- Reading:
  - Murphy: 1.1 - 1.4
  - Probability review: Murphy 2.1-2.3, 2.5.1, 2.5.2, 2.6.3
- Extra Readings:
  - Probability review from the list of reference materials above
  - Linear algebra review from the list of reference materials

[Jan 9] Supervised Learning Example 1: Decision Trees
- Lecture notes: [slides]
- Reading:
  - CIML: Ch. 1
  - Murphy: 16.2

[Jan 11] Generalization and Overfitting
- Lecture notes: [slides]
- Reading:
  - CIML: Ch. 2
  - Understanding Machine Learning: From Theory to Algorithms Ch. 2.1-2.2, start of 2.3
  - [Overfitting]
- Extra Readings:
  - [Generalization error] Think of f_n on the wikipage as what the algorithm returns with n samples.

Week 2: [Jan 14] Train, Dev, and Test Sets
- Lecture notes: [slides]
- Reading:
  - [Train, Test, and Dev sets]
- Extra Readings:
  - [Generalization error] Think of f_n on the wikipage as what the algorithm returns with n samples.

[Jan 16] Supervised Learning Example 2: The Perceptron Algorithm
- Our treatment of linear methods for supervised learning will begin.
- Lecture notes: [slides]
- Reading:
  - CIML: Ch. 4

[Jan 18] Unsupervised Learning Example 1: Clustering and K-means
- Lecture notes: [slides]
- Reading:
  - CIML: Ch. 3
  - Murphy: k-means 11.4.2.5
- Extra Readings:
  - Murphy: more k-means 11.4.2.6, 11.4.2.7,

Week 3: [Jan 23] Unsupervised Learning Example: Principal components analysis
- Lecture notes: [slides]
- Reading:
  - CIML: 15.2
  - Murphy: PCA, Ch 12.2-12.2.3
  - Bishop: Ch 12-12.1 (a good alternative to Murphy)
- Extra Readings:
  - PCA: [wiki]

[Jan 25] PCA (continued)
- Lecture notes: [slides]
- Reading:
  - SVD: [wiki]

Week 4: [Jan 28] Learning as Loss Minimization; Least Squares
- Lecture notes: [slides]
- Reading:
  - CIML: Ch. 7

[Jan 30] Regression
- Lecture notes: [slides]
- Reading:
  - Murphy: Ch. 8.1-8.3
  - Matrix derivates [cheat sheet]
    - (note: in the derivatives section the matrix B is assumed to be symmetric, else you would get a different expression)
- Extra Readings:
  - [1-dim linear regression]
  - [linear regression]

[Feb 1] Regularization and gradient descent
- Regression: from 1 dimension to infinite dimensions
- Lecture notes: [slides]
- Reading:
  - CIML: Ch. 9

Week 5: [Feb 4] Snow day.

[Feb 6] Probabilistic estimation and the maximum likelihood estimation principle
- Also: Bayes optimal prediction; binary classification
- Lecture notes: [slides]
- Reading:
  - CIML Ch. 9
  - Murphy:

[Feb 8] Binary Classification and Gradient Descent
- Lecture notes: [slides]
- Reading:
  - CIML: Ch. 14

Week 6: [Feb 11] Snow day

[Feb 13] Midterm

[Feb 15] Optimization: Gradient Descent and Stochastic Gradient Descent
- Lecture notes: [slides] [annotated slides]
- Reading:
  - CIML: Ch. 14

Week 7: [Feb 20] Variations: Mini-Batching; Multi-Class Classification
- Lecture notes: [slides] [annotated slides]
- Reading:
  - CIML: Ch. 6; Ch. 11

[Feb 22] Neural Nets & Backpropagation
- Lecture Notes: [notes]
- Reading:
  - Bishop: [Bishop] 5.1, 5.3, 5.5 (one the best treatments of backprop, even now)
- Extra Readings:
  - [A better understanding of backprop]. What you gain from coding it up!(and a good read as well.)
  - [Multi-layer Perceptrons] A reasonable discussion of MLPs definition; also backprop pseudocode (the lack of a derivation of backprop makes the pseudocode harder to follow)
  - [Artificial Neural Networks]

Week 8: [Feb 25] The backpropagation algorithm
- Lecture Notes: [notes]
- Reading:
  - Bishop: [Bishop] 5.1, 5.3, 5.5 (one the best treatments of backprop, even now)
- Extra Readings:
  - [A better understanding of backprop]. What you gain from coding it up!(and a good read as well.)
  - [Multi-layer Perceptrons] A reasonable discussion of MLPs definition; also backprop pseudocode (the lack of a derivation of backprop makes the pseudocode harder to follow)
  - [Artificial Neural Networks]

[Feb 27] non-convex optimization tips: initialization; stationary and saddle points; weight symmetries.
- Lecture notes: [slides] [annotated slides]
- Reading:
  - A more modern (and also very good) backprop presentation [here]. This also discusses "saturation". (Note that "z" and "a" are swapped compared to Bishop and our notes.).
- Extra Readings:
  - nice notes about [saddle points in optimization]. And [how to escape] them quickly.
  - Lots of ideas on how to initialize networks. They make sense from scaling considerations. See [Xavier-Initialization] and [here.]
  - More [notes].

[Mar 1] Structured neural nets: Convolutions and Convolutional Neural Nets (and maybe RNNs)
- Guest lecture: Kevin Jamieson
- Lectures Notes: [slides]
- Reading:
  - [Conv Nets]
  - Also, if you are not familiar with convolutions, see [wiki].
- Extra Readings:
  - Some representational issues [here]. Should be taken with a grain of salt as what we can represent is not necessarily what we can easily find with gradient descent.

Week 9: [Mar 4] Auto-Differentiation
- The "cheap gradient" principle (i.e. the Baur-Strassen Theorem) is why modern ML is really possible!
- Lecture Notes: [notes]
- Reading:
  - A more modern backprop presentation [here]. This also discusses "saturation".
- Extra Readings:
  - Notes on (dynamic+static) computational graphs: [notes]

[Mar 6] AD, Computation Graphs, and Runtime. (+PyTorch/TensorFlow)
- Lecture Notes: [notes]
- Reading:
  - [A modern survey of ML Libraries.] See notes for runtime statements, which are not to be concisely stated in this survey.
- Extra Readings:
  - Notes on (dynamic+static) computational graphs: [notes]

[Mar 8] Probabilistic graphical models (and structured models)
- topics: the EM algorithm, Gaussian mixture models, topic mixture models, and hidden Markov models
- Lecture Notes: [notes]
- Reading:
  - CIML: Ch. 16
  - Bishop: Ch. 9

Week 10: [Mar 11] The EM algorithm; The "topic" modeling problem
- Lecture Notes: [notes]
- Reading:
  - CIML: Ch. 16
  - Bishop: Ch. 9
- Extra Readings:
  - [ The EM algo ]
  - [The Baum-Welch algorithm].

[Mar 13] The EM algorithm (more generally), Topic Models, and more general issues.
- Generative models: building/fitting richer generative models
- Variational auto-encoders: why we use these.
- Lecture Notes: [notes]
- Reading:
  - Bishop: Ch. 9
- Extra Readings:
  - [ VAE explanation ]
  - [ blog post ]
  - [ tutorial ]

[Mar 15] Generative Adversarial Networks
- Lecture Notes: [slides] [annotated slides]
- Extra Readings:
  - [ blog post ]
  - [another blog post]. (Take both with a grain of salt.)