CSE 547/Stat 548, Spring 2018
Machine Learning for Big Data
TAs:
Adam Gustafson, Rahul Kidambi, Alon Milchgrub, Krishna Pillutla
Contact: cse547instructors@cs.washington.edu
PLEASE COMMUNICATE TO THE INSTUCTOR AND TAs ONLY THROUGH THIS
EMAIL (unless there is a reason for privacy in your email).
Class lectures: TTh 10:0011:20am, Room: MOR 220
Office Hours:
***Please double check the website before
you arrive for location changes/cancellations.***
Rahul Kidambi: Mondays  4:305:30 pm at CSE 678.
Alon Milchgrub: Tuesdays at 12:301:30 pm at CSE 007.
Adam Gustafson: Wednesdays  1:30  2:30 pm at CMU B023 (Stat Tutoring Center)
Krishna Pillutla:
Thursdays, 4.30  5.30 pm. Location: CSE 678.
Prerequisites and About the Course
Prerequisites: To be successful in this course,
students should have prior exposure
to a graduate machine learning course, at the level of CSE 546. We also expect
students to have a command of linear algebra, algorithms, statistics, probability,
and programming (in python).
Data analysis methods in machine learning and statistics play a
central role in industry and science. The growth of the Web and
improvements in data collection technology in science have lead to a
rapid increase in the magnitude and complexity of these analysis
tasks. This growth is driving the need for scalable, parallel and
online algorithms and models that can handle this "Big Data". This
course will provide a broad foundation.
In particular, we focus on the challenges associated with
datasets of large size and dimensionality, including settings where
the dimensionality of the data is growing faster than the number of
data points. We will also explore the computational foundations associated with performing
these analyses in the context of parallel and cloud architectures.
Discussion Forum and Email Communication
IMPORTANT: All class announcements will be broadcasted using
Canvas. Please send questions about
homeworks, projects and, lectures to the Canvas discussion board . If
you a question of personal
matters, please email the instructors list:
cse547instructors@cs.washington.edu
Optional textbooks
Material in the following optional textbooks may be helpful:
Optional Textbook: Understanding Machine Learning: From Theory to Algorithms, Shai ShalevShwartz and Shai BenDavid.
Optional Textbook: Pattern Recognition and Machine Learning,
Chris Bishop.
Optional Textbook: Machine Learning: A Probabilistic Perspective, Kevin Murphy.
Optional Textbook: The Elements of Statistical Learning: Data Mining, Inference, and Prediction Trevor Hastie, Robert Tibshirani, Jerome Friedman.
Optional textbook: Machine Learning , Tom Mitchell.
AWS Credits (and using another remote cluster)
IMPORTANT: Obtain AWS credits ASAP using this [link]. If
you have technical problems, please post to the discussion board. If
you have already used these credits, please contact the instructors at
the course mailing list. YOU MUST TAKE CARE IN USING THESE AWS
CREDITS OVER THE ENTIRE TERM, WHICH WILL BE USED IN YOUR PROJECT. If
you go over, penalties will be applied.
If you do not plan to use AWS and have access to another remote cluster, you must
let the instructors know this (e.g. on HW1). Your cluster resources must be such
that you have to think about job scheduling and other parallelization issues.
Policies
ALL HOMEWORK MUST BE SUBMITTED, EVEN IF IT IS FOR 0 CREDIT, IN ORDER
TO PASS THE CLASS. (Empty homeworks do not count.) There will be a poster session on Thurs, May
31, from 9noon. YOU MUST BE PRESENT AT THE POSTER SESSION TO PASS THE
CLASS. THESE POLICIES APPLY EVEN IF YOU ARE USING THE PASS/NOPASS GRADING OPTION.
Do not take the class if you are not able to fulfill these
requirements.
Grades:
Grades will be based on four assignments (60%) and a course project (40%).
In a small number of cases, grades may be adjusted after this
breakdown, e.g. grades may increase for particularly meticulous HWs,
course participation, or activity on discussion boards.
Homeworks:
ALL HOMEWORK MUST BE SUBMITTED, EVEN IF IT IS FOR 0 CREDIT, IN ORDER
TO PASS THE CLASS. (Empty homeworks do not count.)
The entire HW must be submitted in one single typed pdf
document (not handwritten). This document must contain all
plots. Latex is preferable, though another
comparable typesetting method is also acceptable.
Homework must be done individually: each
student must hand in their own answers. In addition, each student must
submit their own code in the programming part of the
assignment (we may run your code). It is
acceptable for students to discuss problems with each other; it is not
acceptable for students to look at another students written answers.
It is acceptable for students to discuss coding questions with others; it is not
acceptable for students to look at another students code.
You must also indicate on each homework with whom you
collaborated with.
We expect the students not to copy, refer to, or seek out
solutions in published material on the web or from other textbooks (or
solutions from previous years or other courses) when preparing their
answers. Students are certainly encouraged to read extra material for
a deeper understanding. If you do happen to find an assignment's
answer, it must be acknowledged clearly with an appropriate citation
on the submitted solution.
HW LATE POLICY: Homeworks must be submitted by the posted due date.
You are allowed up to 2 LATE DAYs for the homeworks throughout the entire
quarter, which will automatically be deducted if your assignment is
late. In particular, for any day in which an assignment is late by up to 24 hours, then one
late day will be used (up to two late days). After two of the late days are used up, any
assignment turned in late will incur a reduction of 33% in the final
score, for each day (or part thereof) if it is late. For example, if
an assignment is up to 24 hours late, it incurs a penalty of 33%. Else
if it is up to 48 hours late, it incurs a penalty of 66%. Any
longer, it will receive NO CREDIT.
REGRADING POLICY: All regrading requests must be submitted within
one week after the grades are posted, even if the request is one where
the instructors made an error of omission in grading (or an error in recording the
score). The instructors will not accept regrade requests after this
time period. This is to ensure we can address all concerns in a
timely manner. All grading related
requests must be submitted to the instructor mailing list.
Project Page.
See the project page link for
details. Information about software can be found here.
You are expected to complete the final project for the class. This will
provide you with an opportunity to apply the machine learning concepts
you have learned. We will update the project requirements and due
dates during the quarter. There will be a poster session on Thurs, May
31, from 9noon. YOU MUST BE PRESENT AT THE POSTER SESSION TO PASS THE
CLASS.
Readings
The required readings are for your benefit and they encompass material
that you are required to understand. The extra reading is provided to
give you additional background. Please do the required readings before
each class.
Academic and Personal Integrity
The instructor expects (and believes) that each student will conduct
himself or herself with integrity. While the TAs will follow the
course and university policies with regards to grading and
proctoring, it is ultimately up to you to conduct yourself with academic
and personal integrity for a variety of good reasons (that go well
beyond the class itself).
Diversity and Gender in STEM
While many academic disciplines have historically been dominated
by one cross section of society, the study of and participation in STEM
disciplines is a joy that the instructor hopes that everyone can
pursue. The instructor encourages students to both be
mindful of these issues, and, in good faith, try to take steps to fix
them. You are the next generation here.
Recitation Materials
 Recitation 1: PyTorch and AWS
Lecture Notes and Readings
 Lecture 1: [Mar 27] Intro. Course overview. Review: Backprop
(and AutoDiff motivation).
 Lectures Notes: [pdf]
 Required reading:
 A more modern backprop presentation [here]. This
also discusses "saturation". The notation for a and z is
different than in the lecture notes.
 Extra readings:
 [Bishop]
5.1, 5.3, 5.5. I find Bishop's presentation to be one of the
better ones.
 [A
better understanding of backprop]. What you gain from coding it
up!(and a good read as well.)
 [Multilayer
Perceptrons] A reasonable discussion of MLPs definition; also
backprop pseudocode (the lack of a derivation of backprop makes the
pseudocode harder to follow)
 Lecture 2: [Mar 29] AutoDifferentiation and Computation
Graphs.
 Autodiff is a game changer in applied ML.
 Lectures Notes: [pdf]
 Required reading:
 [
A modern survey of ML
Libraries.] While this does not connect to the complexity
results on autodiff (or discuss when autodiff provides correct
results), it is a good survey.
 Extra Readings:
 [Proof of the BaurStrassen Thm.] An
elementary proof as to why AutoDiff is really possible (ignore all
the stuff about fields. We will cover the argument in class.)
There are earlier results as well, though this one gives a clear
model of computation).
 Lecture 3: [Apr 3] Recitation: AWS, PyTorch
 It is your responsibility to get PyTorch installed.
 Lecture 4: [Apr 5]
The Time Complexity of AD (and the BauerStrassen Theorem)
 Lectures Notes: [pdf]
 Required reading:
 Extra reading:
 Lecture 5: [Apr 10] Dynamic Computation Graphs; "Dimension Free" Optimization
 We have a black box for computing gradients, how do we use it?
 Lectures Notes: [pdf]
 Required reading: Gradient Descent
 Extra reading:
 Lecture 6: [Apr 12] "Dimension Free" Optimization, Stationary Points, and
GD/SGD
 Lectures Notes: [pdf]
 Required reading:
 Survey (and
the paper) of various
ML optimization tools in packages.

Blog post. Escaping (second order) saddle points.
 Extra reading:
 Lecture 7: [Apr 17] SGD, Momentum/Nesterov's acceleration,
and adaptive gradient methods
 Lectures Notes: [pdf]
 Required reading:
 Extra reading:
 Lecture 8: [Apr 19] Adaptive gradient methods and the NearestNeighbor problem
 Lectures Notes: [pdf]
 Required reading:
 Extra reading:
 Adagrad and Adam.
 Technically, momentum and acceleration only help due
to minibatching, as shown here, along
with a provable fix.
That said, minibatching is pretty helpful to do, and, with momentum,
minibatching is even more important!

Bubeck. Convex Optimization: Algorithms and Complexity.
If you are interested in why "momentum" methods are
"optimal" (in a precise sense). See Chapter 3 and 3.5. The whole
monograph is excellent and worth reading.
 Adaptive methods
do
not always help.
Though they do help in some circumstances, quite a bit.
 Lecture 9: Information theoretic metric learning [Apr 24]
 Guest lecture: Adam Gustafson
 Lectures Notes: [pdf]
 Required reading:
 A
nice algorithm for metric learning. It could be interesting to
try on the project.
 Lecture 10: Approximate neighbor finding and localitysensitive hashing (LSH). Background on random
projections.
 Lectures Notes: [pdf]
 Lectures notes:
 Required reading:
 Extra reading:
 The JL proof is
pretty clean (and, if you interested in theoretical aspects of
ML/probabilistic analysis, it is worthwhile to understand it). It
is really based on the central limit theorem.

Voronoi diagrams
 Lecture 11: LSH, continued
 Lectures Notes: [pdf]
 Required reading:
 Lecture 12: KDtrees, Ball trees, Cover trees
 Lectures Notes: [pdf]
 Required reading:

JohnsonLindenstrauss. Understanding random projections is helpful.
KDtrees tutorial
 Randomized Partition Trees or LSH?
Sinha, K.
This is an interesting idea, and you coud try out a variant for
your project. This is also worth a read.
 Extra reading:
 Lecture 13: Minimizing sums of functions; variance reduction; parallelization
Required reading:
 Extra reading:
 Lecture 14: Minimizing sums of functions.
 Lecture 15: Finite Sum optimizations; (tentative)
Tradeoffs in large scale learning. (tentative)
 Lecture 16: Averaging, Averaging, Averaging (and statistics)
 Lecture 17: Averaging; Bandits: NonAdaptive and Adaptive Sampling
 Lecture notes: [bandits]
 Required reading:
 Extra reading:
 Lecture 18: UCB; Bayesian methods: Gittins index and Thompson Sampling
 Lecture notes: [bandits]
 Required reading:
 Extra reading:
 Lecture 19: Bayesian methods: Thompson
Sampling; Linear Bandits and Contextual Bandits
 Lecture notes: [bandits]
 Required reading:
 Extra reading: