This course is a complement of my regularly offered CSE599I: (Stochastic) Interactive Learning course. While no prior experience is required, I will often cite major results from stochastic interactive learning from that course without proof.

Interactive machine learning is a paradigm where the learner actively engages with the environment to collect data or take actions in order to accomplish some goal. For example, perhaps a learner recommends a movie, the user responds with their preferences of the movie in terms of a score, and the process repeats. The goal of the learner is to maximize the user's preferences of movies (total summed scores) by balancing exploration and exploitation over the space of movies. Traditionally, interactive learning is studied in either the IID stochastic setting where, for instance, the user's preferences follow a parametric model and remain fixed over time, or in the adversarial setting where the user's preferences can change arbitrarily from time to time. Unsurprisingly, the guarantees of the stochastic setting are much stronger than the adversarial setting thanks to stronger assumptions. But such assumptions can greatly limit the scenarios in which the stochastic algorithms can be applied, and algorithms can catastrophically fail if assumptions do not hold.

This course aims to study algorithms for settings that lie between the fully stochastic and fully adversarial setting. The first half of this course will review classical algorithms for the adversarial setting (e.g., EXP3/4, mirror descent, etc.) and the stochastic setting (UCB, action elimination, experimental design etc.) to get everyone on the same page and to develop a toolkit. In the second half we will focus on settings that lie between these two extremes with the aim of practical, sample efficient, and robust algorithms. In particular, we will highlight algorithms that obtain the "best of both worlds" that adapt to the unknown setting. We will consider settings where the parametric model is correct, but the user's preferences are allowed to drift or switch over time. We will also consider settings when the model is merely approximately correct, as well as settings where an adversary can arbitrarily corrupt some small number of observations. Finally, we will consider censored and delayed observations.

**Prerequisites**: The course will make frequent references to introductory concepts of machine learning (e.g., CSE 546) but it is not a prerequisite.
However, fluency in basic concepts from linear algebra, statistics, convex analsyis, and calculus will be assumed. Some review materials:

- Linear Algebra Review by Zico Kolter and Chuong Do.
- Linear Algebra, David Cherney, Tom Denton, Rohit Thomas and Andrew Waldron. Introductory linear algebra text.
- Probability Review by Arian Maleki and Tom Do. Also see Chapter 5 of [SzepesvariLattimore] below.

The course will pull from textbooks, papers, and course notes that will evolve with the course. This list will be updated.

- [SzepesvariLattimore] Bandit Algorithms course notes Csaba Szepesvari and Tor Lattimore
- [Jamieson] Informal lecture notes on bandits PDF

We will use Ed as a discussion board (you should have received an invite if registered for the course, otherwise email the instructor). We will not be using Canvas discussion board. This is your first resource for questions. For private or confidential questions email the instructor directly. You may also get messages to the instructor through anonymous course feedback (though, I cannot respond to you personally so this is imperfect).

You will be evaluated on three things

- (35%) One homework problem set due around week 6
- (45%) Participation in presenting a topic (within a group of 3-4) for up to two lectures. You will be assigned a paper (an attempt will be made to match preferences) and a random group. You will work together to present background material for the topic and the main results of the paper. This is more than just regurgitating what is in the paper, it is an opportunity to teach. You can split the work up amongst the group however you'd like. Details to come.
- (20%) In the last 5 weeks of course, when we transition to topics/papers, you are asked to submit questions about the paper being presented that week. This is encouragement for you to read the paper each week, but this could be a light read. You are asked to enter questions into the discussion docs of at least
three separate weeks . You can ask your question before day 1, during class on day 1, between days, or during class on day 2. Feel free to ask your question in class, even if it already written in the doc. Presenters may address your question in class or answer the question in writing in the doc. Links to the docs are on Ed.

Homework: Part I (analysis questions) is due by end of Week 6. Part II (experiments) is due end of Week 10. Please turn in the two parts separately to Gradescope. PDF

The plan for the quarter is to spend the first 5 weeks reviewing classical results from stochastic and adversarial bandits, lectures by instructors. The last 5 weeks will review the staggering progress made in just the last couple years that lies at different points in the middle of these two extremes of stochastic versus adversarial bandits, lectures by students. A single homework assignment will be assigned evaluating your knowledge on the first 5 weeks of the course, since the latter 5 weeks build on these results. If you have any questions or concerns about the schedule, please ask in class, on Ed, or email the instructors.

- Lecture 1: Mar. 30
- Welcome, logistics, overview of course topics
- Introductions
- Introduction to multi-armed bandit problem
- Lecture notes PDF
- Lecture 2: Apr. 1
- Stochastic multi-armed bandits, elimination-style algorithm
- Reading: [Jamieson] 1.1, 1.3
- Lecture notes PDF
- Lecture 3: Apr. 6
- Adversarial multi-armed bandits, oblivious adversaries and EXP3
- Reading: [SzepesvariLattimore] 11
- Typeset notes: PDF
- Lecture notes PDF
- Lecture 4: Apr. 8
- Adversarial multi-armed bandits, adaptive adversaries and EXP3-IX
- Reading: [SzepesvariLattimore] 12
- Typeset notes: PDF
- Lecture notes TBD
- Lecture 5: Apr. 13
- Stochastic Linear bandits
- Reading: [SzepesvariLattimore] 21-22, [Jamieson] 2-2.4
- Lecture notes PDF
- Lecture 6: Apr. 15
- Stochastic Linear bandits continued, asymptotically optimal solution
- Reading: [SzepesvariLattimore] 25, [Jamieson] 2.4-2.6
- Lecture notes PDF
- Lecture 7: Apr. 20
- Adversarial linear bandits
- Reading: [SzepesvariLattimore] 27.1-27.2
- Lecture notes PDF
- Lecture 8: Apr. 22
- Introduction to Mirror Descent
- Reading: [SzepesvariLattimore] 26,28
- Lecture notes TBD
- Lecture 9: Apr. 27
- Mirror Descent continued
- Reading: [SzepesvariLattimore] 26,28
- Lecture notes TBD
- Lecture 10: Apr. 29
- Finish Mirror Descent and corollaries
- Reading: [SzepesvariLattimore] 26,28
- Lecture notes PDF

- Week 6: May 4, 6
- Best of both worlds for MAB
- Reading: Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously
- Related optional reading: Tsallis-INF Algorithm for Adversarial Corruptions in Stochastic Multiarmed Bandits
- Lecture notes PDF
- Week 7: May 11, 13
- Best of both worlds for linear bandits, corruptions, misspecification
- Reading: Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously
- Related optional reading: Model misspecification for linear bandits
- Lecture notes PDF
- Week 8: May 18, 20
- Learning with hints and easy data
- Reading: Improved Path-length Regret Bounds for Bandits
- Related optional reading: How to leverage loss predictors in contextual bandits?
- Lecture notes PDF
- Week 9: May 25, 27
- Non-stationary means, switching bandits
- Reading: Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes
- Related optional reading: An Optimal Black-box Approach (Meta-algorithm)
- Lecture notes PDF
- Week 10: June 1, 3
- Non-stochastic best-arm identification
- Reading: Best of both worlds: Stochastic & adversarial best-arm identification
- Related optional reading: Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization
- Lecture notes PDF