CSE 446, Winter 2019
Machine Learning
Quick links:
Lectures
Homework
Office Hours
Section Materials
TAs:
Kousuke Ariga,
Benjamin Evans,
Shobhit Hathi,
Alina Liokumovich,
Mathew (Xi) Liu,
Thomas Merth,
Patrick Spieker,
Yuchong (Yvonna) Xiang.
Class lectures: MWF 9:30-10:20am, Room: SIG 134
Contact: cse446-staff@cs.washington.edu
Please communicate to the instructor and TAs only through this account.
Please send all questions about
homeworks, lectures, and policies to the Piazza discussion board. If
you have a question of personal matters, please email the instructors
list: cse446-staff@cs.washington.edu.
Announcements:
Please make sure you monitor for (and receive) announcements from
both the official UW class
mailing list and from Piazza. Piazza is a convenient to send out some
announcements, such as homework corrections and clarifications. It is
important for you to make sure you get these announcements in a timely manner.
***Please double check the website before
you arrive for location changes/cancellations.***
Kousuke Ariga: Thursday 4:30-5:30, CSE 286 or 2nd floor breakout
Benjamin Evans: Tuesday 3:30-4:30, CSE 007
Shobhit Hathi: Tuesday 10:30-11:30, CSE 674
Sham Kakade: Monday 3:30-4:45, Gates 303
Alina Liokumovich: Monday 11:00-12:00, Gates 151
Mathew (Xi) Liu: Wednesday 10:30-11:30, CSE 007
Thomas Merth: Wednesday 4:00-5:00 CSE 021
Patrick Spieker: Friday 11:30-12:30, Gates 150
Yuchong (Yvonna) Xiang: Thursday 3:30-4:30, Gates 151
About the Course and Prerequisites
Machine learning explores the study and construction of algorithms
that learn from data in order to make inferences about future
outcomes. This study is a marriage of algorithms, computation, and
statistics, and the class will focus on concepts from all three areas.
The study of learning from data is
playing an increasingly important role in numerous areas of science
and technology, and the goal
of this course are to provide a thorough grounding in the fundamental
methodologies and algorithms of machine learning.
Prerequisites: Students entering the class should be comfortable with
programming and should have a pre-existing working knowledge of
probability and statistics (MATH 394, STAT390, STAT 391, or CSE 312), linear algebra (MATH
308), vector calculus (MATH 324), and algorithms. Students who are
weak in these areas should either: take the course at a later
date, when better prepared, or expect to put in substantially more
effort to catch up. For refreshers, there are
statistics/probability and linear algebra reference materials below.
Textbooks and reference materials
The required reading assignments will be from the following two books:
Some of the following reference materials may be helpful throughout the quarter and some are more
advanced, which you may find useful downstream to deepen your
understanding of the topics.
- Machine Learning
- Understanding
Machine Learning: From Theory to Algorithms, Shai
Shalev-Shwartz, Shai Ben-David. A gentle introduction to
theoretical machine learning. I would recommend this book if you
are seeking a deeper understanding of ML.
- Pattern
Recognition and Machine Learning, Chris Bishop. A little
older and very good (for linear models, EM, neural nets, among
other things).
- The
Elements of Statistical Learning: Data Mining, Inference, and
Prediction, Trevor Hastie, Robert Tibshirani, Jerome
Friedman. More statistical.
- Computer
Age Statistical Inference: Algorithms, Evidence and Data
Science, Bradley Efron, Trevor Hastie. Similar to previous book.
- Probability and Statistics
- A
First Course in Probability, Sheldon Ross. Elementary concepts
(previous editions are inexpensive on Amazon)
- Pattern
Recognition and Machine Learning (linked to above). Section 1.2
- Iain Murray's crib-sheet
- All of Statistics, Larry Wasserman. Chapters 1-5 are a great probability refresher and the book is a good reference for statistics.
- Linear Algebra and Matrix Analysis
- Linear Algebra Review and Reference by Zico Kolter and Chuong Do (free). Light refresher for linear algebra and matrix calculus if you're a bit rusty.
- Linear Algebra, David Cherney, Tom Denton, Rohit Thomas and Andrew Waldron (free). Introductory linear algebra text.
- Matrix Analysis Horn and
Johnson. A great reference from elementary to advanced
material.
- Optimization
- Python
- Latex
Grading and Policies
Grades will be based on five
homework assignments (40%), a midterm (20%), and a final (40%). The
cumulative homework score will be the MAXIMUM of the following two
weighting schemes: scheme 1 weights the five assignments according to
10%, 22.5%, 22.5%, 22.5%, 22.5%, respectively; scheme 2 weights the
five assignments according to 0%, 25%, 25%, 25%, 25%,
respectively. (There may be minor reweighting of HW1-HW4 based on the
difficulty of the HWs). The first homework assignment carries less
weight due to it serving as a refresher of the prerequisite background
knowledge in probability, statistics, and linear algebra. It is
mandatory that you turn in the first homework.
NEW: we will also consider another weighting scheme of assignments
(60%), a midterm (15%), and a final (25%), and we will take the max of
these two schemes when determining the final grade.
The course will have a substantial amount of extra
credit, which will at most impact a student score by half a letter grade,
e.g. B+ to A-. The extra credits are given to encourage a mastery of
the subject matter.
In a small number of cases, grades may be adjusted after this
breakdown for the following reasons: active participation in the discussion boards (both
for asking questions and helping others out);
particularly remarkable exam scores; consistently remarkable homeworks
or remarkable extra credit solutions;
grades will (significantly) drop based on failure to submit all the HWs.
Exams:
The midterm will be announced during week 1 of class
and will be posted below in the Lecture Notes section. The final exam
will be at the university scheduled time, on Weds, March 20th from 8:30am-10:20am.
If you are unable to make the exam dates (and do not have an exception
based on UW policies), then do not enroll in the course. Exams
will not be given on alternative dates.
Homeworks:
Homework must be done individually: each
student must hand in their own answers. In addition, each student must
submit their own code in the programming part of the
assignment (we may run your code). It is
acceptable for students to discuss problems with each other; it is not
acceptable for students to look at another students written answers.
It is acceptable for students to discuss coding questions with others; it is not
acceptable for students to look at another students code.
You must also indicate on each homework with whom you
collaborated with.
We expect the students not to copy, refer to, or seek out solutions in
published material on the web or from other textbooks (or solutions from previous
years or other courses)
when
preparing their
answers. Students are certainly encouraged to read extra material for
a deeper understanding. If you
happen to find an assignment's answer, it must be acknowledged clearly with an
appropriate citation on the submitted solution.
HW LATE POLICY: Homeworks must be submitted by the posted due
date. You are allowed up to 4 total LATE DAYs for the homeworks
throughout the entire quarter and up to 2 late days PER HOMEWORK assignment;
these will be automatically deducted if your assignment is late. For
example, any day in which an assignment is late by up to 24 hours,
then one late day will be used (up to two late days). After your late
days are used up, late penalties will be applied: any assignment
turned in late will incur a reduction in score by 33% for each late
day, so if an assignment is up to 24 hours late, it incurs a penalty
of 33%. Else if it is up to 48 hours late, it incurs a penalty of
66%. And any longer, it will receive no credit.
We will track all your late days
and any deductions will be applied in computing the final grades.
If you are unable to turn in HWs on time, aside from permitted days, then
do not enroll in the course.
Re-grading policy:
All re-grading requests (for the
homework and the midterm) must be submitted on Gradescope within
seven days after the grades are released. For example, if we return
the grades on Monday, then you have until midnight the following
Monday to submit any re-grade requests. If you feel that we have made
an error in grading your homework, please let us know with a written
explanation. This policy is to ensure that we can address any concerns
in a timely and fair manner. The focus of office hours and in person
discussions are solely limited to asking knowledge related questions,
and grade related questions must be submitted to the instructor
mailing list.
Honor Code
The instructor expects (and believes) that each student will conduct
himself or herself with academic (and personal) integrity. While the TAs will follow the
course and university policies with regards to grading and
proctoring (see CSE
conduct policy), it is ultimately up to you to conduct yourself with
academic and personal integrity for a number of reasons that
go beyond the scope of just this class.
Diversity and Gender in STEM
While many academic disciplines have historically been dominated
by one cross section of society, the study of and participation in
STEM disciplines is a joy that the instructor hopes that everyone can
pursue, regardless of their socio-economic background, race, gender,
etc. The instructor encourages students to both be mindful of these
issues, and, in good faith, try to take steps to fix them. You are the
next generation here.
Readings
The required readings are for your benefit and they encompass material
that you are required to understand. Furthermore, the readings will
help in being able to better understand the machine learning terminology being used,
both in the homeworks and exams. Sometimes there are alternative,
equivalent ways to refer to the same object (a classifier, a hypothesis, a target
function, etc.), and the required readings also ensure you will be better
versed in machine learning terminology.
The extra reading is provided to give you additional background.
Homeworks should be submitted via Gradescope. There will be a
log of any changes to the HW (e.g. typos fixed,
clarifications, etc) in the discussion board.
- Homework 2 + Extra Credit
- Due Thurs, Feb 7th.
- "milestone": (preliminary, no credit ) answer to Q2 due Weds, Jan 30th.
- for you to check your grasp of python, numpy, and PCA.
- hw2
- extra credit: due Thurs
Feb 7th.
- Data: mnist.pkl.gz
mnist_2_vs_9.gz
- See MNIST
for original data source.
- Homework 3 + EC
- HW3 due Thurs, Feb 28th. EC due, Sun, Mar 3.
- "milestone": (preliminary, no credit ) answer to Q3 due
Thurs, Feb 21. For you to check your progress on logistic regression.
- hw3
- Data for EC: mnist_all_50pca_dims.gz
- Homework 4 + EC
- HW4+EC due Thurs, Mar 14th.
- "milestone": (preliminary, no credit ) answer to Q3 due
Weds, Mar 6th. For you to check that you can install/use PyTorch
on a "warmup" problem.
- hw4
- Week 1 - Section 1: Python; Probability review
- Basics and Packages (numpy, pandas, matplotlib): [slides]
- Probability Review: [slides]
- Week 2 - Section 2: Linear algebra review I, Perceptron Review
- Linear algebra basics in Jupyter Notebook: [html]
- Assigned Perceptron Reading: [CIML: Ch. 4]
- Week 3 - Section 3: Principal Component Analysis overview
- Week 4 - Section 4: Linear Regression, Loss Functions, and more NumPy!
- a good set of numpy exercises: [link]
- simple NumPy worksheet: [worksheet]
- Week 5 - Section 5: Midterm Review
- Week 6 - No Section (Midterm Week)
- Week 7 - Section 7: Post-Midterm Review
- Week 8 - Section 8: Intro to Pytorch!
- Pytorch Intro Jupyter notebook: [notebook]
- Pytorch Intro Jupyter notebook SGD Example: [notebook]
- Binary Logistic Regression Example: [python script]
- Binary Linear Regression Example: [python script]
- Pytorch Official Tutorials (highly recommended): [link]
- Week 9 - Section 9: Auto-diff, Cross Entropy, Softmax, and more...
- Week 10 - Section 10: Final Review
- Pre-Final Practice Questions: [pdf]
- Week 1: [Jan 7] Introduction.
- Logistics; What is Machine Learning?; The supervised learning problem
- Lecture notes: [pdf]
- Reading:
- Murphy: 1.1 - 1.4
- Probability review: Murphy 2.1-2.3, 2.5.1, 2.5.2, 2.6.3
- Extra Readings:
- Probability review from the list of reference materials above
- Linear algebra review from the list of reference materials
- [Jan 9] Supervised Learning Example 1: Decision Trees
- Lecture notes: [slides]
- Reading:
- [Jan 11] Generalization and Overfitting
- Lecture notes: [slides]
- Reading:
- Extra Readings:
- Week 2: [Jan 14] Train, Dev, and Test Sets
- Lecture notes: [slides]
- Reading:
- Extra Readings:
- [Jan 16] Supervised Learning Example 2: The Perceptron Algorithm
- Our treatment of linear methods for supervised learning will begin.
- Lecture notes: [slides]
- Reading:
- [Jan 18] Unsupervised Learning Example 1:
Clustering and K-means
- Lecture notes: [slides]
- Reading:
- CIML: Ch. 3
- Murphy: k-means 11.4.2.5
- Extra Readings:
- Murphy: more k-means 11.4.2.6, 11.4.2.7,
- Week 3: [Jan 23] Unsupervised Learning Example: Principal components analysis
- Lecture notes: [slides]
- Reading:
- CIML: 15.2
- Murphy: PCA, Ch 12.2-12.2.3
- Bishop: Ch 12-12.1 (a
good alternative to Murphy)
- Extra Readings:
- [Jan 25] PCA (continued)
- Lecture notes: [slides]
- Reading:
- Week 4: [Jan 28] Learning as Loss Minimization; Least Squares
- Lecture notes: [slides]
- Reading:
- [Jan 30] Regression
- Lecture notes: [slides]
- Reading:
- Murphy: Ch. 8.1-8.3
- Matrix derivates [cheat
sheet]
- (note: in the derivatives section the matrix B is
assumed to be symmetric, else you would get a different expression)
- Extra Readings:
- [Feb 1] Regularization and gradient descent
- Regression: from 1 dimension to infinite dimensions
- Lecture notes: [slides]
- Reading:
- Week 5: [Feb 4] Snow day.
- [Feb 6] Probabilistic estimation and the maximum likelihood
estimation principle
- Also: Bayes optimal prediction; binary classification
- Lecture notes: [slides]
- Reading:
- [Feb 8] Binary Classification and Gradient
Descent
- Lecture notes: [slides]
- Reading:
- Week 6: [Feb 11] Snow day
- [Feb 13] Midterm
- [Feb 15] Optimization: Gradient Descent and Stochastic Gradient Descent
- Week 7: [Feb 20] Variations: Mini-Batching; Multi-Class Classification
- [Feb 22] Neural Nets & Backpropagation
- Lecture Notes: [notes]
- Reading:
- Bishop: [Bishop]
5.1, 5.3, 5.5 (one the best treatments of backprop, even now)
- Extra Readings:
- Week 8: [Feb 25] The backpropagation algorithm
- Lecture Notes: [notes]
- Reading:
- Bishop: [Bishop]
5.1, 5.3, 5.5 (one the best treatments of backprop, even now)
- Extra Readings:
- [Feb 27] non-convex optimization tips: initialization;
stationary and saddle points; weight symmetries.
- Lecture notes: [slides] [annotated slides]
- Reading:
- A more modern (and also very good) backprop presentation [here]. This
also discusses "saturation". (Note that "z" and "a" are swapped
compared to Bishop and our notes.).
- Extra Readings:
- [Mar 1] Structured neural nets: Convolutions and
Convolutional Neural Nets (and maybe RNNs)
- Guest lecture: Kevin Jamieson
- Lectures Notes: [slides]
- Reading:
- [Conv
Nets]
- Also, if you are not familiar with convolutions, see [wiki].
- Extra Readings:
- Some representational issues [here]. Should
be taken with a grain of salt as what we can represent is not
necessarily what we can easily find with gradient descent.
- Week 9: [Mar 4] Auto-Differentiation
- The "cheap gradient" principle (i.e. the Baur-Strassen Theorem) is why modern ML is really possible!
- Lecture Notes: [notes]
- Reading:
- A more modern backprop presentation [here]. This
also discusses "saturation".
- Extra Readings:
- Notes on (dynamic+static) computational graphs: [notes]
- [Mar 6] AD, Computation Graphs, and Runtime. (+PyTorch/TensorFlow)
- Lecture Notes: [notes]
- Reading:
- Extra Readings:
- Notes on (dynamic+static) computational graphs: [notes]
- [Mar 8] Probabilistic graphical models (and
structured models)
- topics: the EM algorithm, Gaussian mixture models, topic mixture models, and hidden Markov models
- Lecture Notes: [notes]
- Reading:
- CIML: Ch. 16
- Bishop: Ch. 9
- Week 10: [Mar 11] The EM algorithm; The "topic" modeling problem
- Lecture Notes: [notes]
- Reading:
- CIML: Ch. 16
- Bishop: Ch. 9
- Extra Readings:
- [Mar 13] The EM algorithm (more generally), Topic Models, and
more general issues.
- Generative models: building/fitting richer generative models
- Variational auto-encoders: why we use these.
- Lecture Notes: [notes]
- Reading:
- Extra Readings:
- [Mar 15] Generative Adversarial Networks