CSE 546, Autumn 2017 Machine Learning

Lecture: Tuesday, Thursday 11:00-12:20 Room: MUE 153

Contact: cse546-instructors@cs.washington.edu

Discussion: Canvas discussion board, Slack (invite link on Canvas Discussion, or request via email directly above if not registered)

TAs: Dae Hyun Lee, Yao Lu, Aravind Rajeswaran, Nancy Wang

Office Hours (check discussion board for exceptions):

Nancy Wang: Monday 4:00-5:00 PM, CSE 220
Yao Lu: Tuesday 2:30-3:30 PM, CSE 220
Aravind Rajeswaran: Wednesday 3:00-4:00 PM, CSE 220
Dae Hyun Lee: Thursday 1:30-2:30 PM, CSE 007
Instructor: Friday 1:00-2:00 PM, CSE 666

About the Course and Prerequisites

Machine learning explores the study and construction of algorithms that can learn from data. This study combines ideas from both computer science and statistics. The study of learning from data is playing an increasingly important role in numerous areas of science and technology.

This course is designed to provide a thorough grounding in the fundamental methodologies and algorithms of machine learning. The topics of the course draw from classical statistics, from machine learning, from data mining, from Bayesian statistics, and from optimization.

Prerequisites: Students entering the class should be comfortable with programming and should have a pre-existing working knowledge of linear algebra, probability, statistics and algorithms. For a brief refresher you may consult:

Linear algebra review by Zico Kolter and Chuong Do
Murphy Chapter 2: Probability 2.1-2.6, 2.8 in the required textbook

Textbook

The required textbook will be (should be available at U Bookstore by start of class):

Machine Learning: A Probabilistic Perspective, Kevin Murphy.

Material in the optional textbooks may also be helpful. All of the following are either free on the authors' webpages or are available to UW students on the campus network.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Trevor Hastie, Robert Tibshirani, Jerome Friedman. Aesthetically beautiful with plenty of algorithms, examples, theory, and intuition.

Computer Age Statistical Inference: Algorithms, Evidence and Data Science, Bradley Efron, Trevor Hastie. Includes material like the Bootstrap and false-discovery-rate control that other books overlook.

Machine learning is a marriage of statistics and algorithms. Algorithms more often than not involve an optimization program, thus the following resources may be useful:

Numerical Optimization, Nocedal, Wright (must be on UW network to access Springerlink). Practical algorithms and advice for general optimization problems.

Convex Optimization: Algorithms and Complexity, Sébastien Bubeck. Elegant proofs for the most popular optimization procedures used in machine learning.

Discussion Forum and Email Communication

IMPORTANT: All class announcements will be broadcasted using the Canvas discussion board. The same applies to questions about homeworks, projects and lectures. If you have a question of personal matters, please email the instructors list: cse546-instructors@cs.washington.edu.

We are experimenting with Slack this quarter. An invite link is available on the Canvas Discussion board. If not registered, please request an invite link by sending an email to cse546-instructors@cs.washington.edu. Slack lowers the barrier to asking for help and encourages more interaction. It is also a place where students who are not registered can interact with the rest of the class (unlike Canvas)

Please send all questions to Slack or the discussion board, since other students may have the same questions, and we need to be fair in terms of how we interact with everyone. Also, please feel free to participate, answer each others' questions, etc.

Grading and Evaluation

Your grade will be based on 5 homework assignments (65%) and a final project (35%).

Homework

Your homework score will be the smaller of 100 points and the cumulative number of points you receive on the assignments. The first homework is worth 10 points, and the final four are worth 25 each. This means if you receive grades $(x_0,x_1,x_2,x_3,x_4)$ you will receive a score of $\min(100, x_0+x_1+x_2+x_3+x_4)$. In particular, if you receive grades

$(10,25,25,25,0)$ you will get a total homework score of $85$.
$(10,25,25,25,15)$ you will get a total homework score of $100$.
$(10,25,25,25,25)$ you will get a total homework score of $100$.

Homeworks must be submitted by the posted due date at 11:59 PM Seattle time.

Late work will receive a score of 0.
All assignments must be submitted (even if late for a score of 0). If not, you will not pass.
All assignments are to be submitted electronically on canvas.

Each homework assignment contains both theoretical questions and will have programming components.

You are required to use Python for the programming portions. There are a number of excellent tutorials for getting started with Python. You may use any numerical linear algebra package, but you may not use machine learning libraries (e.g. sklearn, pytorch, tensorflow) unless otherwise specified (later in the course). Your code should be submitted as an executable script (e.g., *.py) and not pasted into a typeset document.
You must submit your HW as a typed PDF document typeset in Latex (not handwritten). Learn Latex in 30 minutes. Use an online editor or install and use LaTeX on your local machine (recommended). Also note that LaTeX is installed on department-run machines.

The first homework (10 points) is designed to be very easy and its purpose is to get you comfortable with Python and Latex. There will be generous office hours for assistance.

COLLABORATION POLICY: Homework must be done individually: each student must hand in their own answers. In addition, each student must write and submit their own code in the programming part of the assignment (we may run your code). It is acceptable, however, for students to collaborate in figuring out answers and helping each other solve the problems. You must also indicate on each homework with whom you collaborated.

RE-GRADING POLICY: All grading related requests must be submitted to the TA via email only. Office hours and in person discussions are limited solely to asking knowledge related questions, not grade related questions. If you feel that we have made an error in grading your homework, please let us know with a written explanation, and we will consider the request. Please note that regrading of a homework may cause your grade to go up or down on the entire homework set.

LATE POLICY: Homeworks must be submitted by the posted due date. There is no credit for late work. The homework scoring system of above is an attempt to minimize the harshness of this policy.

NO EXCEPTIONS WILL BE GIVEN TO THE GRADING POLICIES (unless based on university policies, e.g. medical reasons). IF YOU ARE NOT ABLE TO COMPLY WITH THE LATE HOMEWORK POLICY, DUE TO TRAVEL, CONFERENCES, OTHER DEADLINES, OR ANY OTHER REASON, DO NOT ENROLL IN THE COURSE.

HONOR CODE: As we sometimes reuse problem set questions from previous years, covered by papers and webpages, we expect the students not to copy, refer to, or look at the solutions in preparing their answers (referring to unauthorized material is considered a violation of the honor code). Similarly, we expect students not to google directly for answers. The homework is to help you think about the material, and we expect you to make an honest effort to solve the problems. If you do happen to use other material, it must be acknowledged clearly with a citation on the submitted solution. For more information, please see the CSE Academic Misconduct policy that this course adheres to.

Project

You will work independently or with a partner on a machine learning project spanning most of the quarter ending with a poster presentation and written report. You may use techniques developed in this course but are also encouraged to learn and apply new methods. The project should address a novel question with a non-obvious answer and must have a real-data component. We will provide some seed project ideas. You can pick one of these ideas, and explore the data and algorithms within and beyond what we suggest. You can also use your own data/ideas, but, in this case, you have to make sure you have the data available at the time of the proposal and a nice roadmap, since a quarter is too short to explore a brand new concept. The components of the project are

Project Proposal (5 points): A one page maximum description of your project with: 1) project title, 2) dataset(s), 3) Project idea (two paragraphs), 4) Software you will write and/or use, 5) papers to read (include 1-3 relevant papers), 6) will you have a teammate?, and 7) what will you complete by the milestone (experimental results are expected)?
Project Milestone (10 points): Your write up should be 3 pages maximum in NIPS format, not including references. You should describe the results of your first experiments here and what you wish to accomplish before the final presentation and paper submission. Note that, as with any conference, the page limits are strict! Papers over the limit will not be considered.
Poster presentation (20 points): We will hold a poster session in the Atrium of the Paul Allen Center. Each team will be given a stand to present a poster summarizing the project motivation, methodology, and results. The poster session will give you a chance to show off the hard work you put into your project, and to learn about the projects of your peers. We will provide poster boards that are 32x40 inches. Both one large poster or several pinned pages are OK (fonts should be easily readable from 5 feet away).
Project Report (65 points): Your write up should be 8 pages maximum in NIPS format. You should describe the task you solved, your approach, the algorithms, the results, and the conclusions of your analysis. Note that, as with any conference, the page limits are strict! Papers over the limit will not be considered.

Example project ideas can be found here.

Homework

Homework 0: Warm up (10 points)

Due: 11:59 PM Thursday October 5
Homework: PDF

Homework 1: Linear Regression, Cross-validation (25 points)

Due: 11:59 PM Tuesday October 17
Homework: PDF

Homework 2: Classification, Lasso, Optimization (25 points)

Due: Thursday November 2
Homework: PDF, data for problem 4

Homework 3: Kernels, Unsupervised Learning, Matrix completion (25 points)

Due: Tuesday November 21
Homework: PDF, data for problem 4 (removed)

Homework 4: Convex Optimization, Deep Learning (25 points)

Due: Tuesday December 5
Homework: PDF

Important Dates

Date	Deliverable Due
10/5	Homework 0
10/17	Homework 1
10/24	Project proposal
11/2	Homework 2
11/14	Project milestone
11/21	Homework 3
12/5	Homework 4
12/7	Poster presentation
12/7	Project report due
12/14	Project Reviews due

Schedule

Lecture 1: Introduction and MLE

Topics: Welcome/overview, MLE for Bernoulli and Gaussians
Required reading: Murphy 1, 2.1-2.6
Slides: lecture slides, annotated lecture slides

Lecture 2: Bayesian Inference, MAP, Regression

Topics: MLE, MAP, Linear Least squares
Required reading: Murphy 3.1-3.3, 4.6, 5.1-5.2, 7.1-7.3
Slides: lecture slides, annotated lecture slides

Lecture 3: Regression, Overfitting

Topics: Linear Least squares, Overfitting
Required reading: Murphy 7.1-7.3, 6.1-6.5, 7.5.1, 7.6
Slides: lecture slides, annotated lecture slides

Lecture 4: Ridge Regression, Model Selection and Assessment

Topics: Ridge regression, k-fold cross validation, Bootstrap
Required reading: Murphy 7.5-7.6, 6.2
Optional, further reading on Bootstrap: Efron and Hastie 10-11
Slides: lecture slides, annotated lecture slides

Lecture 5: Lasso Regression, Convexity, Logistic

Topics: Lasso regression, Convexity, Logistic regression
Required reading: Murphy 13.1-13.4, 8.1-8.3
Slides: lecture slides, annotated lecture slides

Lecture 6: Logistic Regression, Optimization

Topics: Logistic regression, Convexity, Optimization
Required reading: Murphy 8.1-8.3
Optional, further reading on optimization: Nocedal and Wright 2-3
Slides: lecture slides, annotated lecture slides

Lecture 7: Optimization, Online Learning

Topics: RLS, LMS, SGD
Required reading: Murphy 8.5
Slides: lecture slides, annotated lecture slides
Sham Kakade's notes on SGD: notes

Lecture 8: Online Learning, SVMs

Topics: SGD, Perceptron, SVMs
Required reading: Murphy 8.5.4, 14.1-14.5
Slides: lecture slides, annotated lecture slides
Generalization of learning with SGD: Hardt, Recht, Singer, 2016

Lecture 9: Local methods, Kernels

Topics: Nearest neighbors, local least squares, and kernels for classification and regression
Required reading: Murphy 14.1-14.5, 1.4
Slides: lecture slides, annotated lecture slides

Lecture 10: Kernels, Trees

Topics: Kernels, decision trees, bagging, random forests
Required reading: Murphy 14.1-14.5, 16.1-16.4
Slides: lecture slides, annotated lecture slides
Random Features and RBF: Rahimi, Recht, 2007
Random forests and Xbox Kinect: link

Lecture 11: Bagging, Boosting

Topics: Bagging, Boosting, Additive models
Required reading: Murphy 16.1-16.5
Slides: lecture slides, Boosting slides

Lecture 12: Neural Networks

Topics: Basic intro to neural networks
Required reading: Murphy 16.1-16.5
Slides: Neural Network slides

Lecture 13: Supervised Learning Recap, SVD and PCA

Topics: overview of learning methods discussed so far, intro to the SVD and PCA
Required reading: Murphy 12.2-12.3
Slides: lecture slides, annotated lecture slides

Lecture 14: SVD, PCA

Topics: SVD, PCA, power method, matrix completion
Required reading: Murphy 12.2-12.3
Slides: lecture slides, annotated lecture slides

Lecture 15: Data representation, Clustering

Topics: k-means, spectral clustering, hierarchical clustering
Required reading: 11.4.2.5-11.4.2.7, 25.1, 25.4-25.5
Slides: lecture slides, annotated lecture slides

Lecture 16: Density Estimation, GMM

Topics: Kernel density estimation, Gaussian mixture models, EM
Required reading: 11.1-11.4.3, 11.6
Slides: lecture slides, annotated lecture slides

Lecture 17: Tips and Tricks, Data Pre-processing, Feature extraction

Topics: Hyperparameter tuning, feature extraction for images, deep learning
Slides: lecture slides
Reading

Lecture 18: Feature extraction

Topics: recurrent neural networks, feature extraction for text, classification
Slides: lecture slides, annotated lecture slides
Reading

Lecture 19: Interactive methods, Reinforcement Learning

Topics: A/B testing, multi-armed bandits, reinforcement learning
Slides: lecture slides
Reading
- Lectures on multi-armed bandits
- Interactive learning examples: Amazon, New Yorker, Facebook/Google/profitable-web-company
- Ali Rahimi's Test of Time talk at NIPS 2017