CSE 599M, Spring 2021
Interactive Machine Learning in Non-stochastic Environments

Lecture: Tue, Thu 10:00-11:20 Remote (Zoom link on Canvas or Ed)

Instructor: Professor Kevin Jamieson, Allen School of Computer Science, UW

Co-instructor: Professor Lalit Jain, Foster School of Business, UW

TA office hours:

Yuhao Wan: TBD

Instructor office hours:

Kevin Jamieson: Tuesday 5:00 PM - 6:00 PM Remote (Zoom link on Canvas)

About the Course and Prerequisites

This course is a complement of my regularly offered CSE599I: (Stochastic) Interactive Learning course. While no prior experience is required, I will often cite major results from stochastic interactive learning from that course without proof.

Interactive machine learning is a paradigm where the learner actively engages with the environment to collect data or take actions in order to accomplish some goal. For example, perhaps a learner recommends a movie, the user responds with their preferences of the movie in terms of a score, and the process repeats. The goal of the learner is to maximize the user's preferences of movies (total summed scores) by balancing exploration and exploitation over the space of movies. Traditionally, interactive learning is studied in either the IID stochastic setting where, for instance, the user's preferences follow a parametric model and remain fixed over time, or in the adversarial setting where the user's preferences can change arbitrarily from time to time. Unsurprisingly, the guarantees of the stochastic setting are much stronger than the adversarial setting thanks to stronger assumptions. But such assumptions can greatly limit the scenarios in which the stochastic algorithms can be applied, and algorithms can catastrophically fail if assumptions do not hold.

This course aims to study algorithms for settings that lie between the fully stochastic and fully adversarial setting. The first half of this course will review classical algorithms for the adversarial setting (e.g., EXP3/4, mirror descent, etc.) and the stochastic setting (UCB, action elimination, experimental design etc.) to get everyone on the same page and to develop a toolkit. In the second half we will focus on settings that lie between these two extremes with the aim of practical, sample efficient, and robust algorithms. In particular, we will highlight algorithms that obtain the "best of both worlds" that adapt to the unknown setting. We will consider settings where the parametric model is correct, but the user's preferences are allowed to drift or switch over time. We will also consider settings when the model is merely approximately correct, as well as settings where an adversary can arbitrarily corrupt some small number of observations. Finally, we will consider censored and delayed observations.

Prerequisites: The course will make frequent references to introductory concepts of machine learning (e.g., CSE 546) but it is not a prerequisite. However, fluency in basic concepts from linear algebra, statistics, convex analsyis, and calculus will be assumed. Some review materials:

Linear Algebra Review by Zico Kolter and Chuong Do.
Linear Algebra, David Cherney, Tom Denton, Rohit Thomas and Andrew Waldron. Introductory linear algebra text.
Probability Review by Arian Maleki and Tom Do. Also see Chapter 5 of [SzepesvariLattimore] below.

The course will be analysis heavy, with a focus on methods that work well in practice. You are strongly encouraged to complete the self-test of fundmamental prerequisites on your own (not to be turned in or graded). You should be able to complete most of these in your head or with minimal computation.

Class materials

The course will pull from textbooks, papers, and course notes that will evolve with the course. This list will be updated.

[SzepesvariLattimore] Bandit Algorithms course notes Csaba Szepesvari and Tor Lattimore
[Jamieson] Informal lecture notes on bandits PDF

Discussion Forum and Email Communication

We will use Ed as a discussion board (you should have received an invite if registered for the course, otherwise email the instructor). We will not be using Canvas discussion board. This is your first resource for questions. For private or confidential questions email the instructor directly. You may also get messages to the instructor through anonymous course feedback (though, I cannot respond to you personally so this is imperfect).

Grading an evaluation

You will be evaluated on three things

(35%) One homework problem set due around week 6
(45%) Participation in presenting a topic (within a group of 3-4) for up to two lectures. You will be assigned a paper (an attempt will be made to match preferences) and a random group. You will work together to present background material for the topic and the main results of the paper. This is more than just regurgitating what is in the paper, it is an opportunity to teach. You can split the work up amongst the group however you'd like. Details to come.
(20%) In the last 5 weeks of course, when we transition to topics/papers, you are asked to submit questions about the paper being presented that week. This is encouragement for you to read the paper each week, but this could be a light read. You are asked to enter questions into the discussion docs of at least three separate weeks. You can ask your question before day 1, during class on day 1, between days, or during class on day 2. Feel free to ask your question in class, even if it already written in the doc. Presenters may address your question in class or answer the question in writing in the doc. Links to the docs are on Ed.

Schedule

Homework: Part I (analysis questions) is due by end of Week 6. Part II (experiments) is due end of Week 10. Please turn in the two parts separately to Gradescope. PDF

Schedule

The plan for the quarter is to spend the first 5 weeks reviewing classical results from stochastic and adversarial bandits, lectures by instructors. The last 5 weeks will review the staggering progress made in just the last couple years that lies at different points in the middle of these two extremes of stochastic versus adversarial bandits, lectures by students. A single homework assignment will be assigned evaluating your knowledge on the first 5 weeks of the course, since the latter 5 weeks build on these results. If you have any questions or concerns about the schedule, please ask in class, on Ed, or email the instructors.

Part I: Review of classical results and methods

Lecture 1: Mar. 30

Welcome, logistics, overview of course topics
Introductions
Introduction to multi-armed bandit problem
Lecture notes PDF

Lecture 2: Apr. 1

Stochastic multi-armed bandits, elimination-style algorithm
Reading: [Jamieson] 1.1, 1.3
Lecture notes PDF

Lecture 3: Apr. 6

Adversarial multi-armed bandits, oblivious adversaries and EXP3
Reading: [SzepesvariLattimore] 11
Typeset notes: PDF
Lecture notes PDF

Lecture 4: Apr. 8

Adversarial multi-armed bandits, adaptive adversaries and EXP3-IX
Reading: [SzepesvariLattimore] 12
Typeset notes: PDF
Lecture notes TBD

Lecture 5: Apr. 13

Stochastic Linear bandits
Reading: [SzepesvariLattimore] 21-22, [Jamieson] 2-2.4
Lecture notes PDF

Lecture 6: Apr. 15

Stochastic Linear bandits continued, asymptotically optimal solution
Reading: [SzepesvariLattimore] 25, [Jamieson] 2.4-2.6
Lecture notes PDF

Lecture 7: Apr. 20

Adversarial linear bandits
Reading: [SzepesvariLattimore] 27.1-27.2
Lecture notes PDF

Lecture 8: Apr. 22

Introduction to Mirror Descent
Reading: [SzepesvariLattimore] 26,28
Lecture notes TBD

Lecture 9: Apr. 27

Mirror Descent continued
Reading: [SzepesvariLattimore] 26,28
Lecture notes TBD

Lecture 10: Apr. 29

Finish Mirror Descent and corollaries
Reading: [SzepesvariLattimore] 26,28
Lecture notes PDF

Part II: Recent results and methods

Week 6: May 4, 6

Best of both worlds for MAB
Reading: Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously
Related optional reading: Tsallis-INF Algorithm for Adversarial Corruptions in Stochastic Multiarmed Bandits
Lecture notes PDF

Week 7: May 11, 13

Best of both worlds for linear bandits, corruptions, misspecification
Reading: Achieving Near Instance-Optimality and Minimax-Optimality
Related optional reading: Model misspecification for linear bandits
Lecture notes PDF

Week 8: May 18, 20

Learning with hints and easy data
Reading: Improved Path-length Regret Bounds for Bandits
Related optional reading: How to leverage loss predictors in contextual bandits?
Lecture notes PDF

Week 9: May 25, 27

Non-stationary means, switching bandits
Reading: Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes
Related optional reading: An Optimal Black-box Approach (Meta-algorithm)
Lecture notes PDF

Week 10: June 1, 3

Non-stochastic best-arm identification
Reading: Best of both worlds: Stochastic & adversarial best-arm identification
Related optional reading: Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization
Lecture notes PDF

CSE 599M, Spring 2021 Interactive Machine Learning in Non-stochastic Environments

About the Course and Prerequisites

Class materials

Discussion Forum and Email Communication

Grading an evaluation

Schedule

Schedule

Part I: Review of classical results and methods

Part II: Recent results and methods

CSE 599M, Spring 2021
Interactive Machine Learning in Non-stochastic Environments