CSE 546, Autumn 2015
Machine Learning
TAs: Naozumi Hiranuma, Angli Liu, John Thickstun
Contact: cse546-instructors@cs.washington.edu
Class lectures: TTh 10:30-11:50am, MOR 230
Recitations (only on some weeks): Weds, 5:30-6:50 pm, MOR 230
Recitation for Final: Weds Dec 9th, 5:30-6:50 pm, MOR 230
Office Hours (Kakade): Fri, 10:00-10:50 am, CSE 436
Office Hours (Nao): Fri, 3:00-4:00 pm, CSE 220
Office Hours (Angli: Weds, 3:00-4:00 pm, CSE 220
Office Hours (John): Mon, 5:30-6:30 pm, CSE 220
Syllabus
Machine learning explores the study and construction of algorithms
that can learn from data. This study combines ideas from both
computer science and statistics. The study of learning from data is
playing an increasingly important role in numerous areas of science
and technology.
This course is designed to provide a thorough grounding in the
fundamental methodologies, statistics, mathematics, and algorithms of
machine learning. The topics of the course draw from classical
statistics, from machine learning, from data mining, from Bayesian
statistics, and from statistical algorithms.
Prerequisites: Students entering the class should have a pre-existing working
knowledge of probability, statistics and algorithms, though the class
has been designed to allow students with a mathematical background
to catch up and fully participate.
Discussion Forum
IMPORTANT: All class announcements will be broadcasted using the
Catalyst discussion board. The same applies to questions about
homeworks, projects and lectures. If you have a question of personal
matters, please email the instructors list:
cse546-instructors@cs.washington.edu.
Otherwise, please send all questions to this board, since other
students may have the same questions, and we need to be fair in terms
of how we interact with everyone. Also, please feel free to
participate, answer each others' questions, etc.
Material and (optional) textbooks
The course material will be primarily drawn from posted notes.
Material in the following optional textbooks may be helpful:
Optional Textbook: Machine Learning: a Probabilistic Perspective , Kevin Murphy.
Optional Textbook: Pattern Recognition and Machine Learning , Chris Bishop.
Optional Textbook: The Elements of Statistical Learning: Data Mining, Inference, and Prediction Trevor Hastie, Robert Tibshirani, Jerome Friedman.
Optional textbook: Machine Learning , Tom Mitchell.
Grading
You MUST be present at both the midterm and the final. The only
exceptions will be for conference/workshop travel. No
other exceptions will be made. If you are not able to make
these dates, then do not take the class. The midterm is in class on
Nov 5th. The final is at the scheduled university time: 10:30-12:20 Monday, Dec. 14, 2015.
In addition to the below, there is up to 10% subjective room for increase in
grades due to extra participation (e.g. in the discussion boards or in
class) and for particularly impressive projects/homeworks.
Midterm (15%)
Homeworks (4 assignments 35%)
Final project (30%)
Final exam (20%)
Homework policy
Important Note: As we sometimes reuse problem set questions from
previous years, covered by papers and webpages, we expect the students
not to copy, refer to, or look at the solutions in preparing their
answers (referring to unauthorized material is considered a
violation of the honor code). Similarly, we expect to not to google directly for
answers. The homework is to help you
think about the material, and we expect you to make an honest effort
to solve the problems. If you do
happen to use other material, it must be acknowledged clearly with a
citation on the submitted solution.
Collaboration policy
Homeworks will be done individually: each student must hand in their
own answers. In addition, each student must write their own code in
the programming part of the assignment. It is acceptable, however, for
students to collaborate in figuring out answers and helping each other
solve the problems. You also must indicate on each homework with whom
you collaborated.
Late homework policy
Homeworks are due at the beginning of class, unless otherwise specified, through Catalyst.
Any assignment turned in late, will incur a reduction of 33% in
the final score, for each day (or part thereof) it is late. For
example, if an assignment is up to 24 hours late, it incurs a penalty
of 33%. Else if it is up to 48 hours late, it incurs a penalty of
66%. And any longer, it will receive no credit.
You are allowed to use 3 LATE DAYs throughout the entire quater only for the homeworks. Please use these wisely, and plan ahead for conferences, travel, deadlines, etc.
You must turn in all 4 homeworks, even if for zero credit, in order to
pass the course. (Empty homeworks do not count.)
Homework regrades policy
If you feel that we have made an error in grading your homework,
please turn in your homework with a written explanation, and we will
consider your request. Please note that regrading of a homework may
cause your grade to go up or down.
Project Page Link
You are expected to complete a final project for the class. This will provide you with an opportunity to apply the machine learning concepts you have learned. We will update the project requirements and due dates during the quarter.
Recitations
Recitations will only occur on some Weds, depending on interest. The
schedule is here:
Homework
- Homework 1
- Due Oct 20th.
- homework pdf
- data pdf
- Homework 2
- Due Nov 3rd.
- homework pdf
- data pdf
- Homework 3
- Due Nov 24rd.
- homework pdf
- data pdf
- Homework 4
- Due Dec 11th.
- homework pdf
- data pdf
Schedule and notes
- Lecture 1: Introduction
- Maximum likelihood estimation
- The central limit theorem and large deviations
- lecture notes pdf
- Optional reading: central limit theorem proof pdf
- Lecture 2: Linear Algebra Review and Regression
- Least squares
- SVD and the pseudo-inverse
- lecture notes pdf
- Lecture 3: Bias-Variance Tradeoff (& Optimization Implications)
- Sample size issues
- Bias-Variance Tradeoff
- Ridge Regression
- Coordinate Ascent
- lecture notes pdf
- Optional reading: ridge regression analysis pdf
- Lecture 4: Decision Trees
- guest lecture: Sameer Singh
- slides
pdf
- annotated slides pdf
- Lecture 5: Feature selection 1
- Subset selection
- Algorithms: Lasso, Orthogonal matching pursuit,Boosting
- lecture notes
pdf
- Optional reading: feature selection analysis pdf
- Optional reading: large deviations pdf
- Lecture 6: Feature selection 2
- Theory (the orthogonal case)
- Theory (RIP and the near-orthogonal case)
- lecture notes
pdf
- Lecture 7: Feature construction
- boosting
- more tricks: kernels
- even more tricks: random features
- lecture notes
pdf
- Lecture 8: Loss functions
- Binary classification
- Convexity
- Gradient descent
- lecture notes
pdf
- Lecture 9: Gradient descent and Stochastic Gradient Descent
- optimization issues
- stochastic gradient descent
- lecture notes pdf
- Lecture 10: Stochastic Gradient Descent
- Stochastic Gradient Descent
- non-smooth optimization
- lecture notes pdf
- Midterm (in class)
- Lecture 11: SGD and generalization
- Stochastic Gradient Descent
- generalization
- lecture notes pdf
- Lecture 12: binary classification
- the perceptron algorithm
- SVMs
- lecture notes pdf
- Lecture 13: Dimensionality reduction
- PCA
- related slides
pdf
- description of PCA (see Section 3.2)
pdf
- Random Projections
- statement of the theorem
pdf
- Optional reading: PCA vs. Ridge Regression
pdf
- Lecture 14: PCA and Clustering
- k-means
- PCA and learning and clustering
- learning with missing data
- Extra reading Bishop Ch 9
- Lecture 15: Expectation maximization
- Gaussian mixture models
- learning with missing data
- Extra reading Bishop Ch 9
- wiki page EM algo
- Thanksgiving (no class)
- Lecture 16: Sequence modeling
- extra reading: Bishop Ch 13
- Hidden markov models
- Conditional random fields
- Structured prediction
- Lecture 17: Deep Learning 1
- Lecture 18: Learning theory
- Concentration and the union bound
- lecture notes pdf
- Lecture 19: RNNs and LSTMs
- guest lecture: Antoine Bosselut