CSE 599, Winter 2020
Interactive Learning

Lecture: Tue, Thu 10:00-11:20 Room: CSE2 G10

TA office hours:

Jennifer Brennan: Tue 2:00-3:00, Allen CSE 220
Andrew Wagenmaker: Thu 11:30-12:30, Gates CSE2 150

Instructor office hours:

Kevin Jamieson: Wed 4:00-5:00, Gates CSE2 340

About the Course and Prerequisites

The standard approach to machine learning uses a training set of labeled examples to learn a prediction rule that will predict the labels of new examples. Collecting such training sets can be expensive and time-consuming. This course will explore methods that leverage already-collected data to guide future measurements, in a closed loop, to best serve the task at hand. We focus on two paradigms: i) in pure-exploration we desire algorithms that identify or learn a good model using as few measurements as possible (e.g., classification, drug discovery, science), and ii) in regret minimization we desire algorithms that balance taking measurements to learn a model with taking measurements to exploit the model to obtain high reward outcomes (e.g., content recommendation, medical treatment design, ad-serving).

The literature on adaptive methods for machine learning has exploded in the past few years and can be overwhelming. This course will classify different adaptive machine learning problems by characteristics such as the hypothesis space, the available actions, the measurement model, and the available side information. We will identify general adaptive strategies and cover common proof techniques.

List of topics:

Stochastic Multi-armed Bandits

Regret minimization
Pure Exploration

Stochastic Linear Bandits

Regret minimization
Pure exploraton
Experimental Design

Stochastic Nonparametric Bandits

Kernels, Gaussian Processes
Lipschitz, convex

Online learning

Mirror descent for regret minimization

Non-stochastic Multi-armed Bandits

Regret minimization via online learning
Pure Exploration

Contextual bandits

Stochastic and non-stochastic bandits

Binary classification

Disagreement-based methods
Greedy information gain methods

A version of this course was previously offered two years ago. Its course website may be a useful resource for related literature and course notes.

Prerequisites: The course will make frequent references to introductory concepts of machine learning (e.g., CSE 546) but it is not a prerequisite. However, fluency in basic concepts from linear algebra, statistics, and calculus will be assumed (see HW0). Some review materials:

Linear Algebra Review by Zico Kolter and Chuong Do.
Linear Algebra, David Cherney, Tom Denton, Rohit Thomas and Andrew Waldron. Introductory linear algebra text.
Probability Review by Arian Maleki and Tom Do. Also see Chapter 5 of [SzepesvariLattimore] below.

The course will be analysis heavy, with a focus on methods that work well in practice. You are strongly encouraged to complete the self-test of fundmamental prerequisites on your own (not to be turned in or graded). You should be able to complete most of these in your head or with minimal computation.

Class materials

There will not be a textbook for the course. Our discussion will be guided by papers, monographs, and lecture notes that are available online. An incomplete list that will grow:

[SzepesvariLattimore] Bandit Algorithms course notes Csaba Szepesvari and Tor Lattimore
[BubeckCesaBianchi] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems Sebastien Bubeck and Nicolo Cesa-Bianchi
[Jamieson] Informal lecture notes on bandits

Discussion Forum and Email Communication

There will be a Slack channel (first day of class). This is your first resource for questions. For private or confidential questions email the instructor. You may also get messages to the instructor through anonymous course feedback.

Grading and Evaluation

There will be 3 homeworks (each worth 20%) and one take-home cumulative final exam (worth 40%).

Submission guidelines

Each homework assignment will be submitted as a single PDF to gradescope. Any code for a programming problem should come at the end of the problem, after any requested figures for the problem. We expect all assignments to be typeset (i.e., no photos or scans of written work). This can be done in an editor like Microsoft Word or Latex (highly recommended). There exist convenient packages for listing Python code in Latex.

Regrades: If you feel that we have made an error in grading your homework, please submit a regrade request via gradescope,and we will consider your request. Please note that regrading of a homework may cause your grade to go up or down.
Here is gradescope help.
You will enroll yourself into gradescope with a code that will be on Slack in the General channel.

Latex resources:

Learn Latex in 30 minutes
Overleaf. An online Latex editor.
Standalone Latex editor on your local machine
Latex Math symbols
Detexify LaTeX handwritten symbol recognition

Collaboration Policy

Homeworks must be done individually: each student must hand in their own answers. In addition, each student must write their own code in a programming part of the assignment. It is acceptable, however, for students to collaborate in figuring out answers and helping each other solve the problems. You also must indicate on each homework with whom you collaborated.

The homework problems have been carefully chosen for their pedagogical value and hence might be similar or identical to those given out in past offerings of this course at UW, or similar courses at other schools. Using any pre-existing solutions from these sources, from the Web or other textbooks constitues a violation of the academic integrity expected of you and is strictly prohibited.

Late Policy

We will allow 5 total late days. If an assignment is submitted late and this exceeds your 5 late days, that assignment will receive 0 credit. Late days may be spread over any number of assignments, but the total number may not exceed 5. Late days are rounded up so that an assignment that is 28 hours late accumulated 2 late days. Late days account for all conference travel, technical difficulties in submitting homework, or any other unforseeable circumstances. Strategize with these at your own risk.

You will be given 24 hours for the take home exam that is expected to take no more than two hours to complete. It cannot be turned in late. Plan accordingly.

Regrading requests

All requests for regrading should be submitted to Gradescope directly. Office hours and in person discussions are limited solely to asking knowledge related questions, not grade related questions. If you feel that we have made an error in grading your homework, please let us know with a written explanation, and we will consider the request. Please note that regrading of a homework means the entire assignment may be regraded which may cause your grade on the entire homework set to go up or down. Regrade requests must be submtted within 7 days (24*7 hours) of the time in which grades are released.

Schedule

Homework 1: Due 11:59 PM on January 24, 2020. PDF
Homework 2: Due 11:59 PM on February 17, 2020. PDF
Homework 3: Due 11:59 PM on March 13, 2020. PDF
Final Exam: Take home, open note (but no group collaboration allowed) exam will be posted here at Monday 11:59 AM March 16, 2020 and due 24 hours later to gradescope. No late work accepted.

Schedule

Lecture 1: Jan. 7

Welcome, logistics, overview of course topics
Chernoff Bound, Sub-gaussian random variables, elimination algorithms for multi-armed bandits
Reading: [SzepesvariLattimore] Chapter 5
Review prerequisites and "self-test" of above on your own (not to be turned in)

Lecture 2: Jan. 9

Action elimination algorithms for multi-armed bandits
Reading: [SzepesvariLattimore] Chapter 4, 6, [Jamieson] Ch. 1, 2

Lecture 3: Jan. 14 Note: Special time/location of 10:30 AM in ECE 105

Intro to adaptive experimental design
Reading: [SzepesvariLattimore] Chapter 21, 22

Lecture 4: Jan. 16

Optimal Linear Experimental Design
Reading: [SzepesvariLattimore] Chapter 21, 22, Lecture Notes on experimental design, [Jamieson] Ch. 5

Lecture 5: Jan. 21

Regret minimization for Linear Bandits
Reading: [SzepesvariLattimore] Chapter 21, 22, 25, [Jamieson] Ch. 6

Lecture 6: Jan. 23

Peculiarities of adaptive sampling
Reading: [SzepesvariLattimore] Chapter 20, Shin et al Are sample means in multi-armed bandits positively or negatively biased?

Lecture 7: Jan. 28

Hypothesis testing, Martingales
Reading: [SzepesvariLattimore] Chapter 3, Howard et al Uniform, nonparametric, non-asymptotic confidence sequences

Lecture 8: Jan. 30

Sequential Testing, More Martingales, Anytime bounds
Reading: [SzepesvariLattimore] Chapter 3, 20

Lecture 9: Feb. 4

Finish linear bandits, start non-linear bandits
Reading: [SzepesvariLattimore] Chapter 3, 20, [BubeckCesaBianchi] Ch 6

Lecture 10: Feb. 6

Derivative free optimization
Reading: [BubeckCesaBianchi] Ch 6.1-2

Lecture 11: Feb. 10

Online learning, exponential weights
Reading: Ch.1, 2 of Bubeck Introduction to Online Optimization
Further reading: Orabona Modern Introduction to Online Learning

Lecture 12: Feb. 12

Mirror Descent
Reading: Ch. 5 of [BubeckCesaBianchi]
Reading: [SzepesvariLattimore] Ch. 26, 28.

Lecture 13: Feb. 18

Adversarial Multi-armed Bandits (EXP3)
Reading: [Jamieson] Ch. 9
Further reading: [SzepesvariLattimore] Ch. 11

Lecture 14: Feb. 20

Adversarial Linear Bandits
Reading: [SzepesvariLattimore] Ch. 27

Lecture 15: Feb. 25

Adversarial Linear Bandits
Reading: Ch. 5 of Bubeck Introduction to Online Optimization

Lecture 16: Feb. 27

Combinatorial Bandits
Reading: [SzepesvariLattimore] Ch. 30

Lecture 17: Mar. 3

Contextual Bandits, EXP4
Reading: [SzepesvariLattimore] Ch. 18, [Jamieson] Ch. 10

Lecture 18: Mar. 5

Stochastic Contextual Bandits, LinUCB, Off-policy evaluation
Reading: [SzepesvariLattimore] Ch. 19, Agarwal's notes on off-policy evaluation

Lecture 19: Mar. 10

Stochastic Contextual Bandits
Reading: [Jamieson] Ch. 10

Lecture 20: Mar. 12

The big picture
Reading:

CSE 599, Winter 2020 Interactive Learning

About the Course and Prerequisites

Class materials

Discussion Forum and Email Communication

Grading and Evaluation

Submission guidelines

Collaboration Policy

Late Policy

Regrading requests

Schedule

Schedule

CSE 599, Winter 2020
Interactive Learning