CSE 446, Winter 2018
Machine Learning
TAs:
Kousuke Ariga,
Benjamin Evans,
Xingfan Huang,
Sean Jaffe,
Vardhman Mehta,
Patrick Spieker,
Jeannette Yu,
Kaiyu Zheng.
Contact: cse446-staff@cs.washington.edu
PLEASE COMMUNICATE TO THE INSTUCTOR AND TAs ONLY THROUGH THIS
EMAIL (unless there is a reason for privacy in your email).
Class lectures: MWF 9:30-10:20am, Room: SIG 134
Office Hours:
***Please double check the website before
you arrive for location changes/cancellations.***
Kousuke Ariga: Wednesday 1:30-2:30pm 2nd floor breakout
Benjamin Evans: Tuesday 9:30-10:30am CSE 021 9:30am-12pm, 3pm-4pm CSE 614 for last OH
Xingfan Huang: Tuesday 11:00-12:00pm CSE 021
Sean Jaffe: Thursday 2:00-3:00 pm CSE 007
Sham Kakade: Monday 2:45-4:15, CSE 436
Vardhman Mehta: Friday 2:30-3:30pm CSE 007
Patrick Spieker: Thursday 12:30pm-1:20pm CSE 021
Jeannette Yu: Wednesday 11:30am-12:30pm CSE 021
Kaiyu Zheng: Monday 11:00-12:00pm CSE 021 9:30am-12pm, Tuesday, 11am-12pm CSE 614 for last OH
About the Course and Prerequisites
Machine learning explores the study and construction of algorithms
that can learn from data. This study combines ideas from both
computer science and statistics. The study of learning from data is
playing an increasingly important role in numerous areas of science
and technology.
This course is designed to provide a thorough grounding in the
fundamental methodologies and algorithms of
machine learning. The topics of the course draw from classical
statistics, from machine learning, from data mining, from Bayesian
statistics, and from optimization.
Prerequisites: Students entering the class should be comfortable with
programming (e.g. python) and should have a pre-existing working
knowledge of probability, statistics, algorithms, and linear algebra.
Discussion Forum and Email Communication
IMPORTANT: All class announcements will be broadcasted using
Canvas. Please send questions about
homeworks, projects and lectures to the Canvas discussion board . If you have a question of personal
matters, please email the instructors list:
cse446-staff@cs.washington.edu.
Material and textbooks
The primary reading assignments will be from the following two books:
A Course in Machine Learning, Hal
Daume.
Machine Learning: A Probabilistic Perspective, Kevin Murphy.
Other helpful textbooks are:
From a more theoretical perspective: Understanding Machine Learning: From Theory to Algorithms, Shai Shalev-Shwartz and Shai Ben-David.
More statistical: The Elements of Statistical
Learning: Data Mining, Inference, and Prediction Trevor Hastie, Robert
Tibshirani, Jerome Friedman.
A little more Bayesian: Pattern Recognition and Machine Learning, Chris Bishop.
From an AI angle: Machine Learning , Tom Mitchell.
Policies
Grades will be based on four assignments (40%), a midterm (20%), and a
final (40%). NEW: we will also consider another weighting scheme of assignments (60%), a midterm (15%), and a
final (25%), and we will take the max of these two schemes. Extra
credit will be included after the max is taken, in manner that is
weighted the same regardless of the the weighting scheme used.
This is to encourage students to actively work on the HWs (including the Extra
Credit). In a small number of cases, grades may be adjusted after this
breakdown, e.g. grades will (significantly) drop based on failure to submit all the
HWs; grades may go up for particularly remarkable exam scores; grades may go
up for consistently remarkable homeworks.
Exams:
If you are not able to make the exam dates (and do not have an exception
based on UW policies), then do not enroll in the course. Exams
will not be given on alternative dates.
Homeworks:
Homework must be done individually: each
student must hand in their own answers. In addition, each student must
submit their own code in the programming part of the
assignment (we may run your code). It is
acceptable for students to discuss problems with each other; it is not
acceptable for students to look at another students written answers.
It is acceptable for students to discuss coding questions with others; it is not
acceptable for students to look at another students code.
You must also indicate on each homework with whom you
collaborated with.
We expect the students not to copy, refer to, or seek out solutions in
published material on the web or from other textbooks (or solutions from previous
years or other courses)
when
preparing their
answers. Students are certainly encouraged to read extra material for
a deeper understanding. If you do
happen to find an assignment's answer, it must be acknowledged clearly with an
appropriate citation on the submitted solution.
HW LATE POLICY: Homeworks must be submitted by the posted due date.
You are allowed up to 2 LATE DAYs for the homeworks throughout the entire
quarter, which will automatically be deducted if your assignment is
late. In particular, for any day in which an assignment is late by up to 24 hours, then one
late day will be used (up to two late days). After two of the late days are used up, any
assignment turned in late will incur a reduction of 33% in the final
score, for each day (or part thereof) if it is late. For example, if
an assignment is up to 24 hours late, it incurs a penalty of 33%. Else
if it is up to 48 hours late, it incurs a penalty of 66%. And any
longer, it will receive no credit.
Academic and Personal Integrity
The instructor expects (and believes) that each student will conduct
himself or herself with integrity. While the TAs will follow the
course and university policies with regards to grading and
proctoring, it is ultimately up to you to conduct yourself with academic
and personal integrity for a number of important reasons.
Diversity and Gender in STEM
While many academic disciplines have historically been dominated
by one cross section of society, the study and participation of STEM
disciplines is a joy that the instructor hopes that everyone can
pursue. It is not obvious to the instructor what the best solution
is. At the least, the instructor encourages students to both be
mindful of these issues and, in good faith, try to take steps to fix them. You are the next generation here.
Readings
The required readings are for your benefit and they encompass material
that you are required to understand. The extra reading is provided to
give you additional background. Please do the required readings before
each class.
Section Materials
- Week 1 - Section 1: Python review
- Basics and packages (numpy, pandas, matplotlib): [slides]
- Virtual environment: [slides][handout]
- Week 2 - Section 2: Linear algebra review I, expected value, notations
- Linear algebra basics in Jupyter Notebook: [HTML]
- Expected value, notations: [slides]
- Week 3 - Section 3: Linear algebra review II, probability, Bayesian optimal classification
- Notes on inner/outer product, projection, probability, etc.: [pdf]
- Week 4 - Section 4: margin of separability, principal component analysis overview
- PCA Jupyter Notebook [HTML]
- Week 5 - Section 5: Midterm review
- Week 6 - No section
- Week 7 - Section 6: GD and SGD clarifications
- Week 8 - Section 7: PyTorch Quick Overview
- PyTorch Introduction with Comparison to Tensorflow [slides]
- PyTorch Jupyter Notebook [HTML]
- Week 9 - Section 8: Neural Nets and PyTorch review
- PyTorch Neural Net (XOR) Jupyter Notebook [HTML]
Lecture Notes and Readings
- Week 1: [Jan 3] Introduction.
- [Jan 5] Decision Trees and Supervised Learning
- Week 2: [Jan 8] The Supervised Learning Problem Setting
- Lectures: [slides] [annotated slides]
- Reading:
- (same as last time, CIML: Ch. 1, **Make sure you read/understand
the "Math Review: Expected Values" Box on page 15**)
- [Overfitting]
- [The Central Limit Theorem]: understand the statement and how it
relates to (and quantifies the rate of) the
law of large numbers.
- Extra Readings:
- [Train,
Test, and Dev sets] (terminology of dev and validations sets
is not standard).
- [Generalization error] Think of f_n on the wikipage as what the
algorithm returns with n samples.
- other
slides: generalization and overfitting presentation is good.
- [Jan 10] Limits of Learning and Inductive Bias
- [Jan 12] Geometry: Nearest Neighbors and K-means
- Lectures: [slides] [annotated slides]
- Reading:
- CIML: Ch. 3
- Murphy: k-means 11.4.2.5
- Extra Readings:
- Murphy: more k-means 11.4.2.6, 11.4.2.7,
- Week 3: [Jan 17] The Perceptron Algorithm
- [Jan 19] The perceptron algorithm convergence proof; voting
- Week 4: [Jan 22] Unsupervised Learning
- [Jan 24] Unsupervised learning: principal components analysis
- [Jan 26] PCA (continued)
- Week 5: [Jan 29] Learning as Loss Minimization; Least Squares
- [Jan 31] Regularization and Optimization; Gradient Descent
- [Feb 2] Probabilistic Models; the Log Loss
- Week 6: [Feb 5] Optimization: Gradient Descent & Stochastic Gradient Descent
- [Feb 7] MIDTERM
- [Feb 9] Midterm review; GD/SGD + Practical Issues
- Week 7: [Feb 12] Guest lecture: John Thickstun; GD/SGD + Practical Issues
- [Feb 14] Probabilistic estimation: MLE and MAP
- [Feb 16] Multi-Class Classification
- Week 8: [Feb 21] Non-convexity: Feature mappings (kernels)
and neural networks
- Lectures Notes: [pdf]
- Reading:
- Extra Readings:
- [Feb 23] Neural Nets & Backpropagation
- Lectures Notes: [pdf]
- Reading:
- Bishop: [Bishop]
5.1, 5.3, 5.5
- Extra Readings:
- Week 9: [Feb 26] Auto-Differentiation, Computation Graphs,
and the Baur-Strassen Theorem
- Lectures Notes: [pdf]
- Reading:
- Bishop: [Bishop]
5.1, 5.3, 5.5
- Extra Readings:
- [Feb 28] Initialization/Weight symmetries, saddle points,
and non-convex optimization
- Lectures Notes: [pdf]
- Reading:
- A more modern backprop presentation [here]. This
also discusses "saturation".
- Extra Readings:
- Lots of stuff out there on how to initialize networks. They basically make sense from scaling considerations. See
[Xavier-Initialization]
and [here.]
- [Mar 2] Structured neural nets: Convolutions and
Convolutional Neural Nets (and maybe RNNs)
- Lectures Notes: [pdf]
- Reading:
- [Conv
Nets]
- Also, if you are not familiar with convolutions, see [wiki].
- Extra Readings:
- Some representational issues [here]. Should
be taken with a grain of salt as what we can represent is not
necessarily what we can easily find with gradient descent.
- Week 10: [Mar 5] Probabilistic graphical models (and
structured models)
- topics: inference, Gaussian mixture models, topic mixture models, and hidden Markov models
- Lectures Notes: [pdf]
- Reading:
- Murphy: Mixture Models and Mixture of Gaussians 11.1,
11.2, 11.2.1.
- Murphy: HMMs 17.3
- Extra Readings:
- [Mar 7] The EM algorithm by example: The "topic" modeling problem
- Lectures Notes: [pdf]
- Reading:
- Murphy: EM (or Bishop Ch 9)
- Extra Readings:
- [Mar 9] DeepBlue, AlphaGo, and AI...