This course is a complement of my regularly offered CSE599I: (Stochastic) Interactive Learning course. While no prior experience is required, I will often cite major results from stochastic interactive learning from that course without proof.
Interactive machine learning is a paradigm where the learner actively engages with the environment to collect data or take actions in order to accomplish some goal. For example, perhaps a learner recommends a movie, the user responds with their preferences of the movie in terms of a score, and the process repeats. The goal of the learner is to maximize the user's preferences of movies (total summed scores) by balancing exploration and exploitation over the space of movies. Traditionally, interactive learning is studied in either the IID stochastic setting where, for instance, the user's preferences follow a parametric model and remain fixed over time, or in the adversarial setting where the user's preferences can change arbitrarily from time to time. Unsurprisingly, the guarantees of the stochastic setting are much stronger than the adversarial setting thanks to stronger assumptions. But such assumptions can greatly limit the scenarios in which the stochastic algorithms can be applied, and algorithms can catastrophically fail if assumptions do not hold.
This course aims to study algorithms for settings that lie between the fully stochastic and fully adversarial setting. The first half of this course will review classical algorithms for the adversarial setting (e.g., EXP3/4, mirror descent, etc.) and the stochastic setting (UCB, action elimination, experimental design etc.) to get everyone on the same page and to develop a toolkit. In the second half we will focus on settings that lie between these two extremes with the aim of practical, sample efficient, and robust algorithms. In particular, we will highlight algorithms that obtain the "best of both worlds" that adapt to the unknown setting. We will consider settings where the parametric model is correct, but the user's preferences are allowed to drift or switch over time. We will also consider settings when the model is merely approximately correct, as well as settings where an adversary can arbitrarily corrupt some small number of observations. Finally, we will consider censored and delayed observations.
Prerequisites: The course will make frequent references to introductory concepts of machine learning (e.g., CSE 546) but it is not a prerequisite. However, fluency in basic concepts from linear algebra, statistics, convex analsyis, and calculus will be assumed. Some review materials:
The course will pull from textbooks, papers, and course notes that will evolve with the course. This list will be updated.
We will use Ed as a discussion board (you should have received an invite if registered for the course, otherwise email the instructor). We will not be using Canvas discussion board. This is your first resource for questions. For private or confidential questions email the instructor directly. You may also get messages to the instructor through anonymous course feedback (though, I cannot respond to you personally so this is imperfect).
You will be evaluated on three things
Homework: Part I (analysis questions) is due by end of Week 6. Part II (experiments) is due end of Week 10. Please turn in the two parts separately to Gradescope. PDF
The plan for the quarter is to spend the first 5 weeks reviewing classical results from stochastic and adversarial bandits, lectures by instructors. The last 5 weeks will review the staggering progress made in just the last couple years that lies at different points in the middle of these two extremes of stochastic versus adversarial bandits, lectures by students. A single homework assignment will be assigned evaluating your knowledge on the first 5 weeks of the course, since the latter 5 weeks build on these results. If you have any questions or concerns about the schedule, please ask in class, on Ed, or email the instructors.