CSE 547/Stat 548, Spring 2018

Machine Learning for Big Data

Instructor: Sham Kakade

TAs: Adam Gustafson, Rahul Kidambi, Alon Milchgrub, Krishna Pillutla

Contact: cse547-instructors@cs.washington.edu

PLEASE COMMUNICATE TO THE INSTUCTOR AND TAs ONLY THROUGH THIS EMAIL (unless there is a reason for privacy in your email).

Discussion: Canvas discussion board

Class lectures: TTh 10:00-11:20am, Room: MOR 220

Office Hours:

***Please double check the website before you arrive for location changes/cancellations.***

Rahul Kidambi: Mondays - 4:30-5:30 pm at CSE 678.

Alon Milchgrub: Tuesdays at 12:30-1:30 pm at CSE 007.

Adam Gustafson: Wednesdays - 1:30 - 2:30 pm at CMU B023 (Stat Tutoring Center)

Krishna Pillutla: Thursdays, 4.30 - 5.30 pm. Location: CSE 678.

Prerequisites and About the Course

Prerequisites: To be successful in this course, students should have prior exposure to a graduate machine learning course, at the level of CSE 546. We also expect students to have a command of linear algebra, algorithms, statistics, probability, and programming (in python).

Data analysis methods in machine learning and statistics play a central role in industry and science. The growth of the Web and improvements in data collection technology in science have lead to a rapid increase in the magnitude and complexity of these analysis tasks. This growth is driving the need for scalable, parallel and online algorithms and models that can handle this "Big Data". This course will provide a broad foundation.

In particular, we focus on the challenges associated with datasets of large size and dimensionality, including settings where the dimensionality of the data is growing faster than the number of data points. We will also explore the computational foundations associated with performing these analyses in the context of parallel and cloud architectures.

Discussion Forum and Email Communication

IMPORTANT: All class announcements will be broadcasted using Canvas. Please send questions about homeworks, projects and, lectures to the Canvas discussion board . If you a question of personal matters, please email the instructors list: cse547-instructors@cs.washington.edu

Optional textbooks

Material in the following optional textbooks may be helpful:

  • Optional Textbook: Understanding Machine Learning: From Theory to Algorithms, Shai Shalev-Shwartz and Shai Ben-David.
  • Optional Textbook: Pattern Recognition and Machine Learning, Chris Bishop.
  • Optional Textbook: Machine Learning: A Probabilistic Perspective, Kevin Murphy.
  • Optional Textbook: The Elements of Statistical Learning: Data Mining, Inference, and Prediction Trevor Hastie, Robert Tibshirani, Jerome Friedman.
  • Optional textbook: Machine Learning , Tom Mitchell.

  • AWS Credits (and using another remote cluster)

    IMPORTANT: Obtain AWS credits ASAP using this [link]. If you have technical problems, please post to the discussion board. If you have already used these credits, please contact the instructors at the course mailing list. YOU MUST TAKE CARE IN USING THESE AWS CREDITS OVER THE ENTIRE TERM, WHICH WILL BE USED IN YOUR PROJECT. If you go over, penalties will be applied.

    If you do not plan to use AWS and have access to another remote cluster, you must let the instructors know this (e.g. on HW1). Your cluster resources must be such that you have to think about job scheduling and other parallelization issues.


    ALL HOMEWORK MUST BE SUBMITTED, EVEN IF IT IS FOR 0 CREDIT, IN ORDER TO PASS THE CLASS. (Empty homeworks do not count.) There will be a poster session on Thurs, May 31, from 9-noon. YOU MUST BE PRESENT AT THE POSTER SESSION TO PASS THE CLASS. THESE POLICIES APPLY EVEN IF YOU ARE USING THE PASS/NO-PASS GRADING OPTION. Do not take the class if you are not able to fulfill these requirements.


    Grades will be based on four assignments (60%) and a course project (40%). In a small number of cases, grades may be adjusted after this breakdown, e.g. grades may increase for particularly meticulous HWs, course participation, or activity on discussion boards.



    The entire HW must be submitted in one single typed pdf document (not handwritten). This document must contain all plots. Latex is preferable, though another comparable typesetting method is also acceptable.

    Homework must be done individually: each student must hand in their own answers. In addition, each student must submit their own code in the programming part of the assignment (we may run your code). It is acceptable for students to discuss problems with each other; it is not acceptable for students to look at another students written answers. It is acceptable for students to discuss coding questions with others; it is not acceptable for students to look at another students code. You must also indicate on each homework with whom you collaborated with.

    We expect the students not to copy, refer to, or seek out solutions in published material on the web or from other textbooks (or solutions from previous years or other courses) when preparing their answers. Students are certainly encouraged to read extra material for a deeper understanding. If you do happen to find an assignment's answer, it must be acknowledged clearly with an appropriate citation on the submitted solution.

    HW LATE POLICY: Homeworks must be submitted by the posted due date. You are allowed up to 2 LATE DAYs for the homeworks throughout the entire quarter, which will automatically be deducted if your assignment is late. In particular, for any day in which an assignment is late by up to 24 hours, then one late day will be used (up to two late days). After two of the late days are used up, any assignment turned in late will incur a reduction of 33% in the final score, for each day (or part thereof) if it is late. For example, if an assignment is up to 24 hours late, it incurs a penalty of 33%. Else if it is up to 48 hours late, it incurs a penalty of 66%. Any longer, it will receive NO CREDIT.

    RE-GRADING POLICY: All regrading requests must be submitted within one week after the grades are posted, even if the request is one where the instructors made an error of omission in grading (or an error in recording the score). The instructors will not accept regrade requests after this time period. This is to ensure we can address all concerns in a timely manner. All grading related requests must be submitted to the instructor mailing list.

    Project Page.

    See the project page link for details. Information about software can be found here.

    You are expected to complete the final project for the class. This will provide you with an opportunity to apply the machine learning concepts you have learned. We will update the project requirements and due dates during the quarter. There will be a poster session on Thurs, May 31, from 9-noon. YOU MUST BE PRESENT AT THE POSTER SESSION TO PASS THE CLASS.


    The required readings are for your benefit and they encompass material that you are required to understand. The extra reading is provided to give you additional background. Please do the required readings before each class.

    Academic and Personal Integrity

    The instructor expects (and believes) that each student will conduct himself or herself with integrity. While the TAs will follow the course and university policies with regards to grading and proctoring, it is ultimately up to you to conduct yourself with academic and personal integrity for a variety of good reasons (that go well beyond the class itself).

    Diversity and Gender in STEM

    While many academic disciplines have historically been dominated by one cross section of society, the study of and participation in STEM disciplines is a joy that the instructor hopes that everyone can pursue. The instructor encourages students to both be mindful of these issues, and, in good faith, try to take steps to fix them. You are the next generation here.

    Recitation Materials

    Lecture Notes and Readings