CSE 541, Winter 2022 Interactive Learning

Lecture: Wed, Fri 10:00-11:20 AM Remote until further notice (Zoom link on Canvas/Ed)

TA office hours:

Jennifer Brennan: Fri 2:30-3:30 PM Remote until further notice (Zoom link on Canvas/Ed)
Lang Liu: Tue 5:00-6:00 PM Remote until further notice (Zoom link on Canvas/Ed)

Instructor office hours:

Kevin Jamieson: Wed 5:00-6:00 PM Remote until further notice (Zoom link on Canvas/Ed)

About the Course and Prerequisites

The traditional approach to machine learning uses a training set of labeled examples to learn a prediction rule that will predict the labels of future examples. Collecting such training sets can be expensive and time-consuming. This course will explore methods that leverage already-collected data to guide future measurements, in a closed loop, to best serve the task at hand. We focus on two paradigms: i) in pure-exploration we desire algorithms that identify or learn a good-enough model using as few measurements as possible (e.g., classification, drug discovery, science), and ii) in regret minimization we desire algorithms that balance exploration and taking measurements to learn a model with taking measurements to exploit the model to obtain high reward outcomes (e.g., content recommendation, medical treatment design, ad-serving).

The literature on interactive (machine) learning has exploded in the past several years and can be overwhelming. This course will classify different interactive learning problems by characteristics such as the hypothesis space, the available actions, the measurement model, and the available side information. We will focus on general algorithmic strategies and common proof techniques. By the end of this course, you will be in a position to begin research in this field, as well as lead an interactive learning software implementation in industry.

List of topics:

Stochastic Multi-armed Bandits
Stochastic Linear Bandits and experimental design
Stochastic Contextual bandits (model-free and model-based)
Stochastic Markov Decision Processes (tabular and linear settings)

Prerequisites: The course will make frequent references to introductory concepts of machine learning (e.g., CSE 546) but it is not a prerequisite. However, fluency in basic concepts from linear algebra, statistics, and calculus will be assumed (see HW0). Some review materials:

Linear Algebra Review by Zico Kolter and Chuong Do.
Linear Algebra, David Cherney, Tom Denton, Rohit Thomas and Andrew Waldron. Introductory linear algebra text.
Probability Review by Arian Maleki and Tom Do. Also see Chapter 5 of [SzepesvariLattimore] below.

The course will be analysis heavy, with a focus on methods that work well in practice. You are strongly encouraged to complete the self-test of fundmamental prerequisites on your own (not to be turned in or graded). You should be able to complete most of these in your head or with minimal computation.

COVID Considerations

If you think there is a possibility that you are sick, please do not come to class or office hours. All lectures all quarter long will be streamed live and recorded. The course will follow the guidelines put forth by the University of Washington, please review them if you are not familiar. In particular, the first week of class will be online only.

Class materials

The course will pull from textbooks and course notes.

[SzepesvariLattimore] Bandit Algorithms course notes Csaba Szepesvari and Tor Lattimore
[AgarwalJiangKakadeSun] Reinforcement Learning: Theory and Algorithms Alekh Agarwal, Nan Jiang, Sham M. Kakade, Wen Sun
[Jamieson] Informal lecture notes on bandits PDF

Discussion Forum and Email Communication

We will use Ed as a discussion board (you should have received an invite if registered for the course, otherwise email the instructor). We will not be using Canvas discussion board. Ed is your first resource for questions. For private or confidential questions email the instructor directly. You may also get messages to the instructor through anonymous course feedback (though, I cannot respond to you personally so this is imperfect).

Grading and Evaluation

There will be 3 homeworks (each worth 20%) and one take-home cumulative final exam (worth 40%).

Submission guidelines

Each homework assignment will be submitted as a single PDF to gradescope. Any code for a programming problem should come at the end of the problem, after any requested figures for the problem. You will receive an email invite once you join the course -- if not please let me know! We expect all assignments to be typeset (i.e., no photos or scans of written work). This can be done in an editor like Microsoft Word or Latex (highly recommended). There exist convenient packages for listing Python code in Latex.

Regrades: If you feel that we have made an error in grading your homework, please submit a regrade request via gradescope, and we will consider your request. Please note that regrading of a homework may cause your grade to go up or down.
Here is gradescope help.
You will automatically be enrolled in gradescope.

Latex resources:

Learn Latex in 30 minutes
Overleaf. An online Latex editor.
Standalone Latex editor on your local machine
Latex Math symbols
Detexify LaTeX handwritten symbol recognition

Collaboration Policy

Homeworks must be done individually: each student must hand in their own answers. In addition, each student must write their own code in a programming part of the assignment. It is acceptable, however, for students to collaborate in figuring out answers and helping each other solve the problems. You also must indicate on each homework with whom you collaborated.

The homework problems have been carefully chosen for their pedagogical value and hence might be similar or identical to those given out in past offerings of this course at UW, or similar courses at other schools. Using any pre-existing solutions from these sources, from the Web or other textbooks constitues a violation of the academic integrity expected of you and is strictly prohibited.

Late Policy

If you need an extra 24 hours on any homework assignment, that's no big deal and you do not require prior permission. If you need multiple days due to personal reasons, please email the instructor. I will try to be accomdating but please don't abuse it, I will evaluate requests as I get them.

You will be given 24 hours for the take home exam that is expected to take no more than two hours to complete. It cannot be turned in late. Plan accordingly.

Regrading requests

All requests for regrading should be submitted to Gradescope directly. Office hours and in person discussions are limited solely to asking knowledge related questions, not grade related questions. If you feel that we have made an error in grading your homework, please let us know with a written explanation, and we will consider the request. Please note that regrading of a homework means the entire assignment may be regraded which may cause your grade on the entire homework set to go up or down. Regrade requests must be submtted within 7 days (24*7 hours) of the time in which grades are released.

Schedule

Homework 0: (Self-examination, Not due but recommend you complete within the first week) PDF
Homework 1: Due 11:59 PM on January 22, 2021. PDF
Homework 2: Due 11:59 PM on February 19, 2021. PDF
Homework 3: Due 11:59 PM on March 13, 2021 (no latework accepted!). PDF

Schedule

Lecture 1: Jan. 5

Welcome, logistics, overview of course topics
Review prerequisites and "self-test" of above on your own (not to be turned in)
Introductions
Concentration of measure
Lecture notes PDF

Lecture 2: Jan. 7

Chernoff Bound, Sub-gaussian random variables, elimination algorithms for multi-armed bandits
Reading: [SzepesvariLattimore] Chapter 5, 6
Lecture notes PDF

Lecture 3: Jan. 12

Elimination algorithms for multi-armed bandits continued
Reading: [SzepesvariLattimore] Chapter 6
Lecture notes PDF

Lecture 4: Jan. 14

Finish elimination algorithms, Optimism and UCB
Reading: [SzepesvariLattimore] 7
Lecture notes PDF

Lecture 5: Jan. 19

Thompson Sampling, Lower bounds
Reading: [SzepesvariLattimore] 13-16
Lecture notes PDF

Lecture 6: Jan. 21

Finish lower bounds, Linear experimental design
Reading: [SzepesvariLattimore] 21
Lecture notes PDF

Lecture 7: Jan. 26

Linear bandits with finite arms. Regret bound for elimination algorithm
Reading: [SzepesvariLattimore] 21, 22
Lecture notes PDF

Lecture 8: Jan. 28

Finish regret bound for elimination algorithm. Introduce UCB and Thompson Sampling
Reading: [SzepesvariLattimore] 19
Lecture notes PDF

Lecture 9: Feb. 2

Self-normalized bounds, Regret bound for UCB
Reading: [SzepesvariLattimore] 20
Lecture notes (N/A, whiteboard)

Lecture 10: Feb. 4

Martingales, method of mixtures
Reading: [SzepesvariLattimore] 3.3, 20
Lecture notes (see future lecture)

Lecture 11: Feb. 9

Diversion on A/B testing
Reading:
Lecture notes (see future lecture)

Lecture 12: Feb. 11

Optimal sequential testing
Reading: See course notes, [SzepesvariLattimore] 3.3, 20
Lecture notes PDF

Lecture 13: Feb. 16

Contextual bandits
Reading: [SzepesvariLattimore] 18
Lecture notes NA

Lecture 14: Feb. 18

Contextual bandits, linear value function approximation
Reading: [SzepesvariLattimore] 19
Lecture notes NA

Lecture 15: Feb. 23

Contextual bandits for arbitrary policy class
Reading: See course notes
Lecture notes PDF

Lecture 16: Feb. 25

Contextual bandits for arbitrary policy class
Reading: See course notes
Lecture notes PDF

Lecture 17: Mar. 2

Finite horizon Markov Decision Processes
Reading: [AgarwalJiangKakadeSun] 1.2, Alternatively, [Jamieson] 7.4
Lecture notes PDF

Lecture 18: Mar. 4

Learning and Finite horizon Markov Decision Processes
Reading: [AgarwalJiangKakadeSun] 7-7.2, Alternatively, [Jamieson] 7.4
Lecture notes N/A

Lecture 19: Mar. 9

Learning and Finite horizon Markov Decision Processes
Reading: [AgarwalJiangKakadeSun] 7-7.2, Alternatively, [Jamieson] 7.4
Lecture notes PDF

Lecture 20: Mar. 11

Learning and Finite horizon Markov Decision Processes with function approximation
Reading: [AgarwalJiangKakadeSun] 8, 16.1-16.2
Lecture notes PDF