The standard approach to machine learning uses a training set of labeled examples to learn a prediction rule that will predict the labels of new examples. Collecting such training sets can be expensive and time-consuming. This course will explore methods that leverage already-collected data to guide future measurements, in a closed loop, to best serve the task at hand. We focus on two paradigms: i) in pure-exploration we desire algorithms that identify or learn a good model using as few measurements as possible (e.g., classification, drug discovery, science), and ii) in regret minimization we desire algorithms that balance taking measurements to learn a model with taking measurements to exploit the model to obtain high reward outcomes (e.g., content recommendation, medical treatment design, ad-serving).

The literature on adaptive methods for machine learning has exploded in the past few years and can be overwhelming. This course will classify different adaptive machine learning problems by characteristics such as the hypothesis space, the available actions, the measurement model, and the available side information. We will identify general adaptive strategies and cover common proof techniques.

Tentative list of topics (a (!) indicates that the topic may not be covered):

- Online learning
- Exponential weights for finite and continuous action spaces (e.g., [Bubeck Ch. 2])
- Incremental gradient descent (e.g., [Bubeck Ch. 4])
- Stochastic Multi-armed Bandits
- Regret minimization (e.g., UCB strategy of [BubeckCesaBianchi Ch. 2], Thompson Sampling [RussoEtAl])
- Pure Exploration (e.g., [JamiesonNowak], [SimchowitzEtAl], [KarninKorenSomekh])
- Non-stochastic Multi-armed Bandits
- Regret minimization (e.g., EXP3 strategy of [SzepesvariLattimore
*Adversarial Bandits*]) - Pure Exploration (e.g., [LiEtAl])
- Regression
- Linear experimental design (e.g., [RaskuttiMahoney], [Pukelsheim])
- Pure-Exploration for linear bandits (e.g., [SoareLazaricMunos])
- Non-linear active regression/MLE (e.g., [CastroWillettNowak], [ChaudhuriMykland])
- Stochastic Linear Bandits
- Regret minimization for linear bandits (e.g., [AbbasiyadkoriEtAl])
- Bayesian methods: Thompson Sampling, Information-directed sampling (e.g., [RussoEtAl], [RussoVanroy])
- Contextual Bandits (e.g., [BubeckCesaBianchi Ch. 4], [LiChuEtAl])
- Continuous Optimization with Bandit feedback
- (Non)-Convex optimization (e.g., [ConnEtAl], [FlaxmanKalaiMcmahan], [BubeckEldanLee])
- Gaussian Process Optimization (e.g., [SrinivasEtAl])
- Binary classification
- Streaming, disagreement-based methods (e.g., [Dasgupta], [DasguptaHsuMonteleoni])
- Pool-based, greedy information gain (e.g., [Dasgupta2], [Nowak])
- Connection to pure Exploration for combinatorial bandits (e.g., [CaoKrishnamurthy])
- Reinforcement Learning, Markov Decision Processes
- Discrete state spaces (e.g., [Szepesvari])
- Monte Carlo Tree Search (e.g., [KocsisSzepesvari])
- Linear dynamics, LQG (e.g., [AbbasiyadkoriSzepesvari], [DeanEtAl])
- (!) Non-stochastic Bandits with side information
- Regret minimization for combinatorial optimization (e.g., [BubeckCseaBianchi])
- Contextual Bandits (e.g., [BubeckCesaBianchi])
- (!) Lower Bounds
- Regression, Classification, Optimization (e.g., [CastroNowak], [Tsybakov Ch. 2])
- Bandits (e.g., [BubeckCesaBianchi], [KaufmannEtAl], [SimchowitzEtAl])

For more, see the resources in the class materials.

Prerequisites: The course will assume introductory machine learning (e.g., CSE 546) and maturity in topics like linear algebra, statistics, and calculus. The course will be analysis heavy, with a focus on methods that work well in practice.

There will not be a textbook for the course. Our discussion will be guided by papers, monographs, and lecture notes that are available online. An incomplete list that will grow:

- [Bubeck] Introduction to Online Optimization Sebastien Bubeck
- [BubeckCesaBianchi] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems Sebastien Bubeck and Nicolo Cesa-Bianchi
- [SzepesvariLattimore] Bandit Algorithms course notes Csaba Szepesvari and Tor Lattimore, 2017
- [EvendarMannorMansour] Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems Even-Dar, Mannor, Mansour
- [JamiesonNowak] Best-arm Identification Algorithms for Multi-Armed Bandits in the Fixed Confidence Setting Jamieson, Nowak, 2014
- [KarninKorenSomekh] Almost Optimal Exploration in Multi-Armed Bandits Karnin, Koren, Somekh, 2013.
- [KaufmannEtAl] On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models Kaufmann, Cappe, Garivier, 2016
- [SimchowitzEtAl] The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime Simchowitz, Jamieson, Recht, 2017
- [SeldinLugosi] An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits Seldin, Lugosi, 2017
- [LiEtAl] Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization Li, Jamieson, DeSalvo, Rostamizadeh, Talwalkar, 2016
- [Pukelsheim] Optimal Design of Experiments Friedrich Pukelsheim, 2006 (free with UW credentials)
- [AllenZhuEtAl] Near-Optimal Design of Experiments via Regret Minimization Allen-Zhu, Li, Singh, Wang, 2017
- [RaskuttiMahoney] A statistical perspective on randomized sketching for ordinary least-squares Garvesh Raskutti, Michael Mahoney
- [SabatoMunos] Active Regression by Stratification Sabato, Munos, 2014
- [SoareLazaricMunos] Best-Arm Identification in Linear Bandits, Soare, Lazaric, Munos, 2014
- [CastroWillettNowak] Faster Rates in Regression Via Active Learning Castro, Willett, Nowak, 2005
- [ChaudhuriMykland] Nonlinear Experiments: Optimal Design and Inference Based on Likelihood, P. Chaudhuri, Mykland, 1993
- [ChaudhuriKakadeEtAl] Convergence Rates of Active Learning for Maximum Likelihood Estimation K. Chaudhuri, Kakade, Netrapalli, Sanghavi, 2015
- [AbbasiyadkoriEtAl] Improved Algorithms for Linear Stochastic Bandits, Abbasi-Yadkori, Pál, Szepesvári, 2011
- [RussoEtAl] A Tutorial on Thompson Sampling Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen
- [RussoVanroy] Learning to Optimize Via Information-Directed Sampling, Daniel Russo, Benjamin Van Roy, 2014
- [LiChuEtAl] A Contextual-Bandit Approach to Personalized News Article Recommendation Li, Chu, Langford, Schapire, 2010
- [LangfordZhang] The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits, Langford, Zhang, 2008
- [ConnEtAl] Introduction to derivative-free optimization, Conn, Scheinberg, Vicente, 2009 (free with UW credentials)
- [FlaxmanKalaiMcmahan] Online convex optimization in the bandit setting: gradient descent without a gradient, Flaxman, Kalai, Mcmahan, 2004
- [Nesterov] Random Gradient-Free Minimization of Convex Functions, Nesterov, 2011
- [BubeckEldanLee] Kernel-based methods for bandit convex optimization Sébastien Bubeck, Ronen Eldan, Yin Tat Lee
- [SrinivasEtAl] Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design Srinivas, Krause, Kakade, Seeger
- [BubeckMunosEtAl] X-Armed Bandits Bubeck, Munos, Stoltz, Szepesvari 2011
- [CaoKrishnamurthy] Disagreement-based combinatorial pure exploration: Efficient algorithms and an analysis with localization Cao, Krishnamurthy, 2017
- [Dasgupta] Two Faces of Active Learning Sanjoy Dasgupta
- [DasguptaHsuMonteleoni] A general agnostic active learning algorithm Dasgupta, Hsu, Monteleoni
- [Nowak] The Geometry of Generalized Binary Search Nowak
- [Dasgupta2] Analysis of a greedy active learning strategy Dasgupta
- [GolovinKrause] Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization Golovin, Krause, 2010
- [CastroNowak] Upper and Lower Error Bounds for Active Learning Castro, Nowak, 2006
- [GittinsEtAl] Multi-armed Bandit Allocation Indices Gittins, Glazebrook, Weber, 2011
- [Szepesvari] Algorithms for Reinforcement Learning Csaba Szepesvari
- [SuttonBarto] Reinforcement Learning: An Introduction Sutton and Barto, 2018.
- [KocsisSzepesvari] Bandit based Monte-Carlo Planning Kocsis, Szepesvari, 2006
- [AbbasiyadkoriSzepesvari] Regret Bounds for the Adaptive Control of Linear Quadratic Systems Abbasi-Yadkori, Szepesvari, 2011
- [DeanEtAl] On the Sample Complexity of the Linear Quadratic Regulator Dean, Mania, Matni, Recht, Tu, 2017

Other resources that may be useful:

- [BoucheronEtAl] Concentration Inequalities: A Nonasymptotic Theory of Independence Stéphane Boucheron, Gábor Lugosi, Pascal Massart (not free)
- [Duchi] Probability Bounds John Duchi
- [Tsybakov] Introduction to Nonparametric Estimation Alexandre B. Tsybakov (free on campus network)
- [BoucheronEtAl2] Theory of Classification: A Survey of Some Recent Advances Boucheron, Bousquet, Lugosi, 2005
- [RasmussenWilliams] Gaussian Processes for Machine Learning Carl Edward Rasmussen and Chris Williams
- [HastieEtAl] The Elements of Statistical Learning: Data Mining, Inference, and Prediction Trevor Hastie, Robert Tibshirani, Jerome Friedman
- [EfronHastie] Computer Age Statistical Inference: Algorithms, Evidence and Data Science Bradley Efron, Trevor Hastie

There will be a Slack channel (first day of class). This is your first resource for questions. For private or confidential questions email the instructor.

Your grade will be based on scribing and potentially presenting a subset of a single lecture (e.g., a technical proof in the work following an overview by the instructor). You are expected to prepare for the lecture you are assigned to well before the lecture occurs by reading and understanding not only the assigned papers but supporting materials and concepts as well (i.e. if the paper uses a lemma from a different paper, you should understand that lemma and where it comes from). Scribing a lecture means summarizing the assigned papers and the lecture itself (with main theorems and proofs) as well as how this work fits into the context of the class so far and why it matters (e.g., linear bandits is a multi-armed bandit game with a given feature vector, a setting with applications to X, Y, Z). You are expected to put several hours into preparing these notes, a mere summary of what was presented in lecture without context or applications is unacceptable. You must come to the instructor's office hours (or by appointment) days preceding the lecture to discuss the plan for that lecture presentation and notes. Instructions will be sent out on how to submit your preferences over lectures -- the plan is to have each innermost bullet point be a single lecture.

Scribe notes should be prepared using the Latex template. The final notes should be turned in within a few days following lecture with the understanding that the majority of the notes should be completed before lecture.

- Lecture 1: Jan. 3
- Welcome, logistics, overview of course topics
- Reading: [Bubeck Ch. 1], [BubeckCesaBianchi Ch. 1]
- Lecture 2: Jan. 5
- Intro to online learning protocol
- Exponential weights: optimization on the simplex
- Reading: [Bubeck Chapter 1, 2.1-2.3]
- Concepts: Convexity, Jensen's inequality, Hoeffding's inequality, exponential weights
- Examples: Weather prediction, mixture of experts
- Characteristics: full information, adversarial, regret-minimization, finite action space
- Notes: Lecture notes
- Lecture 3: Jan. 10
- Stochastic Multi-armed bandits, regret minimization
- Reading: [BubeckCesaBianchi Chapter 1, 2.1-2.2], [Duchi Sections 1-2] (or [BoucheronEtAl Ch. 2])
- Techniques: Chernoff Bound, Hoeffding's inequality
- Algorithms: UCB1, Thompson Sampling
- Examples: Explore versus exploit different treatments for patients
- Characteristics: bandit feedback, stochastic, regret-minimization, finite action space
- Notes: Lecture notes
- Lecture 4: Jan. 12
- Stochastic Multi-armed bandits, pure exploration
- Reading: [JamiesonNowak], [Simchowitz Sec. 4], [KarninKorenSomekh Sec. 4]
- Techniques: Law of the Iterated Logarithm (LIL), top-k identification in fixed confidence and fixed budget settings
- Algorithms: Successive Elimination, LUCB++, Successive Halving
- Examples: Caption contest: find the 3 best captions using crowdsourced votes
- Characteristics: bandit feedback, stochastic, pure-exploration, finite action space
- Notes: Lecture notes
- Lecture 5: Jan. 17
- Non-stochastic Multi-armed bandits, regret minimization
- Reading: [SzepesvariLattimore
*Adversarial Bandits*] - Techniques: Importance Sampling
- Algorithms: EXP3, EXP3.P
- Examples: Planning your commute: train, bus, drive, or walk?
- Characteristics: bandit feedback, non-stochastic, regret-minimization, finite action space
- Notes: Lecture notes
- Lecture 6: Jan. 19
- Non-stochastic Multi-armed bandits, pure exploration
- Reading: [LiEtAl]
- Techniques: Infinite-armed bandits
- Algorithms: Successive Halving, Hyperband
- Examples: Non-convex optimization with random restarts, hyperparameter tuning
- Characteristics: bandit feedback, non-stochastic, pure exploration, infinite action space
- Notes: Lecture notes
- Lecture 7: Jan. 24
- Linear regression, linear sequential experimental design
- Reading: [RaskuttiMahoney], Optimal Design, [SoareLazaricMunos]
- Techniques: Leverage scores, linear optimal experimental design
- Algorithms: Least squares regression, best-arm identification for linear bandits
- Examples: Science, search problems
- Characteristics: stochastic and adversarial, linear least squares, stochastic pure-exploration, finite action space with side information
- Notes: Lecture notes
- Lecture 8: Jan. 26
- Nonlinear sequential experimental design
- Reading: [ChaudhuriMykland], [ChaudhuriKakadeEtAl]
- Techniques: Maximum likelihood estimation(MLE), Fisher information, Laplace approximation
- Algorithms: Sequential MLE with feedback
- Examples: Science, search problems, classification, regression
- Characteristics: stochastic pure-exploration, (in)finite action space with side information
- Notes: Lecture notes
- Lecture 9: Jan. 31
- Linear Bandits, Thompson Sampling
- Reading: [AbbasiyadkoriEtAl]
- Techniques: Confidence ellipsoids for linear predictors
- Algorithms: Linear bandits
- Examples: Path routing with edge weight feedback, recommendation systems
- Characteristics: bandit feedback, stochastic, regret-minimization, (in)finite action space with side information
- Notes: Lecture notes
- Lecture 10: Feb. 2
- Contextual Bandits
- Reading: [LiChuEtAl], [BubeckCesaBianchi Ch. 4], [LangfordZhang]
- Techniques: Confidence ellipsoids for linear predictors, Bayesian posterior sampling
- Algorithms: EXP4, LinUCB, tau-Greedy
- Examples: recommending articles to users
- Characteristics: bandit feedback, adversarial, linear, exponential weights
- Notes: Lecture notes
- Lecture 11: Feb. 7
- Convex optimization with bandit feedback
- Reading: [FlaxmanKalaiMcmahan], [BubeckEldanLee]
- Techniques: One-point gradient estimation
- Algorithms: GP-UCB, Thompson Sampling
- Examples: Recommendation systems over continuous spaces
- Characteristics: bandit feedback, stochastic, regret-minimization, (in)finite action space with metric space
- Notes: TBD
- Lecture 12: Feb. 9
- Bayesian Bandits
- Reading: [RussoEtAl]
- Techniques: Posterior inference, Bayesian posterior sampling
- Algorithms: Thompson Sampling
- Examples: Maximize click through rate with known dependencies
- Characteristics: bandit feedback, stochastic, regret-minimization, (in)finite action space with metric space
- Lecture 13: Feb. 14
- Kernelized and Gaussian Process Bandits
- Reading: [SrinivasEtAl], [RasmussenWilliams Ch.2]
- Techniques: Gaussian priors, dependent arms
- Algorithms: GP-UCB
- Examples: Recommendation systems over continuous spaces
- Characteristics: bandit feedback, stochastic, regret-minimization, (in)finite action space with metric space
- Notes: Lecture notes
- No class: Feb. 16
- Lecture 14: Feb. 21
- Binary classification, Streaming Active Learning
- Reading: [Dasgupta], CAL Notes, [DasguptaHsuMonteleoni]
- Techniques: Active learning for binary classificaiton in streaming setting.
- Algorithms: CAL
- Examples: Learning a spam filter
- Characteristics: stochastic-sampling with deterministic outcomes, pure-exploration
- Notes: Lecture notes
- Lecture 15: Feb. 23
- Binary classification, Pool-based Active Learning
- Reading: [Dasgupta2], [GolovinKrause], [Nowak]
- Techniques: Active learning for binary classificaiton in pool-based setting. Adaptive submodularity optimization. Combinatorial bandits.
- Algorithms: Generalized binary search
- Examples: Learning a classifier with pool of unlabeled examples (e.g., google images)
- Characteristics: stochastic-sampling with deterministic outcomes, pure-exploration
- Notes: Lecture notes
- Lecture 16: Feb. 28
- Pool-based Active Learning, Combinatorial bandits
- Reading: [CaoKrishnamurthy]
- Techniques: Reduction of binary classification to combinatorial bandits.
- Algorithms: Action elimination
- Examples: Top-K arm identification, Learning a classifier with pool of unlabeled examples (e.g., google images)
- Characteristics: stochastic-sampling with deterministic outcomes, pure-exploration
- Lecture 17: Mar. 2
- Finite Markov Decision Processes (MDPs)
- Reading: [Szepesvari Ch. 1, 2]
- Techniques: Value, Q function learning by dynamic programming
- Algorithms: Value iteration, Q-learning
- Examples: Reinforcement learning
- Characteristics: stochastic state dependent actions, stochastic state transitions, regret-minimization, finite action space
- Notes: Lecture notes
- Lecture 18: Mar. 7
- Infinite Markov Decision Processes (MDPs)
- Reading: [Szepesvari Ch. 4]
- Techniques: Learning with rollouts
- Algorithms: parameterized Q-learning, policy gradient
- Examples: Reinforcement learning
- Characteristics: stochastic state dependent actions, stochastic state transitions, regret-minimization, (in)finite action space
- Lecture 19: Mar. 9
- Monte Carlo Tree Search (MCTS)
- Reading: [SuttonBarto Ch. 8], [KocsisSzepesvari]
- Techniques: MDPs with large branching factors
- Algorithms: UCT
- Examples: Playing strategy games and Go
- Characteristics: stochastic state dependent actions, stochastic state transitions, regret-minimization, countable state space, finite action space (per state)
- Notes: Lecture notes
- Lecture 20: Mar. 12 11:00 AM in EEB 045 (
**Note special location and date**) - Linear Quadratic Regulator (LQR)
- Reading: Boyd's notes on LQR, [AbbasiyadkoriSzepesvari]
- Techniques: Automatic control in linear systems
- Algorithms: LQR/LQG
- Examples: Going to the moon and maintaining temperature
- Characteristics: stochastic state dependent actions, stochastic state transitions, regret-minimization, continuous state and action space
- Notes: Lecture notes