CSE 599, Winter 2018
Online and Adaptive Methods for Machine Learning

Lecture: Wed, Fri 11:00-12:20 Room: ARC G070

TA office hours: Dae Hyun Lee, CSE 007 Wednesday 1:30-2:30

Instructor office hours: CSE 666 Tuesday 4:30-5:30

About the Course and Prerequisites

The standard approach to machine learning uses a training set of labeled examples to learn a prediction rule that will predict the labels of new examples. Collecting such training sets can be expensive and time-consuming. This course will explore methods that leverage already-collected data to guide future measurements, in a closed loop, to best serve the task at hand. We focus on two paradigms: i) in pure-exploration we desire algorithms that identify or learn a good model using as few measurements as possible (e.g., classification, drug discovery, science), and ii) in regret minimization we desire algorithms that balance taking measurements to learn a model with taking measurements to exploit the model to obtain high reward outcomes (e.g., content recommendation, medical treatment design, ad-serving).

The literature on adaptive methods for machine learning has exploded in the past few years and can be overwhelming. This course will classify different adaptive machine learning problems by characteristics such as the hypothesis space, the available actions, the measurement model, and the available side information. We will identify general adaptive strategies and cover common proof techniques.

Tentative list of topics (a (!) indicates that the topic may not be covered):

Online learning

Exponential weights for finite and continuous action spaces (e.g., [Bubeck Ch. 2])
Incremental gradient descent (e.g., [Bubeck Ch. 4])

Stochastic Multi-armed Bandits

Regret minimization (e.g., UCB strategy of [BubeckCesaBianchi Ch. 2], Thompson Sampling [RussoEtAl])
Pure Exploration (e.g., [JamiesonNowak], [SimchowitzEtAl], [KarninKorenSomekh])

Non-stochastic Multi-armed Bandits

Regret minimization (e.g., EXP3 strategy of [SzepesvariLattimore Adversarial Bandits])
Pure Exploration (e.g., [LiEtAl])

Regression

Linear experimental design (e.g., [RaskuttiMahoney], [Pukelsheim])
Pure-Exploration for linear bandits (e.g., [SoareLazaricMunos])
Non-linear active regression/MLE (e.g., [CastroWillettNowak], [ChaudhuriMykland])

Stochastic Linear Bandits

Regret minimization for linear bandits (e.g., [AbbasiyadkoriEtAl])
Bayesian methods: Thompson Sampling, Information-directed sampling (e.g., [RussoEtAl], [RussoVanroy])
Contextual Bandits (e.g., [BubeckCesaBianchi Ch. 4], [LiChuEtAl])

Continuous Optimization with Bandit feedback

(Non)-Convex optimization (e.g., [ConnEtAl], [FlaxmanKalaiMcmahan], [BubeckEldanLee])
Gaussian Process Optimization (e.g., [SrinivasEtAl])

Binary classification

Streaming, disagreement-based methods (e.g., [Dasgupta], [DasguptaHsuMonteleoni])
Pool-based, greedy information gain (e.g., [Dasgupta2], [Nowak])
Connection to pure Exploration for combinatorial bandits (e.g., [CaoKrishnamurthy])

Reinforcement Learning, Markov Decision Processes

Discrete state spaces (e.g., [Szepesvari])
Monte Carlo Tree Search (e.g., [KocsisSzepesvari])
Linear dynamics, LQG (e.g., [AbbasiyadkoriSzepesvari], [DeanEtAl])

(!) Non-stochastic Bandits with side information

Regret minimization for combinatorial optimization (e.g., [BubeckCseaBianchi])
Contextual Bandits (e.g., [BubeckCesaBianchi])

(!) Lower Bounds

Regression, Classification, Optimization (e.g., [CastroNowak], [Tsybakov Ch. 2])
Bandits (e.g., [BubeckCesaBianchi], [KaufmannEtAl], [SimchowitzEtAl])

For more, see the resources in the class materials.

Prerequisites: The course will assume introductory machine learning (e.g., CSE 546) and maturity in topics like linear algebra, statistics, and calculus. The course will be analysis heavy, with a focus on methods that work well in practice.

Class materials

There will not be a textbook for the course. Our discussion will be guided by papers, monographs, and lecture notes that are available online. An incomplete list that will grow:

[Bubeck] Introduction to Online Optimization Sebastien Bubeck
[BubeckCesaBianchi] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems Sebastien Bubeck and Nicolo Cesa-Bianchi
[SzepesvariLattimore] Bandit Algorithms course notes Csaba Szepesvari and Tor Lattimore, 2017
[EvendarMannorMansour] Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems Even-Dar, Mannor, Mansour
[JamiesonNowak] Best-arm Identification Algorithms for Multi-Armed Bandits in the Fixed Confidence Setting Jamieson, Nowak, 2014
[KarninKorenSomekh] Almost Optimal Exploration in Multi-Armed Bandits Karnin, Koren, Somekh, 2013.
[KaufmannEtAl] On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models Kaufmann, Cappe, Garivier, 2016
[SimchowitzEtAl] The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime Simchowitz, Jamieson, Recht, 2017
[SeldinLugosi] An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits Seldin, Lugosi, 2017
[LiEtAl] Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization Li, Jamieson, DeSalvo, Rostamizadeh, Talwalkar, 2016
[Pukelsheim] Optimal Design of Experiments Friedrich Pukelsheim, 2006 (free with UW credentials)
[AllenZhuEtAl] Near-Optimal Design of Experiments via Regret Minimization Allen-Zhu, Li, Singh, Wang, 2017
[RaskuttiMahoney] A statistical perspective on randomized sketching for ordinary least-squares Garvesh Raskutti, Michael Mahoney
[SabatoMunos] Active Regression by Stratification Sabato, Munos, 2014
[SoareLazaricMunos] Best-Arm Identification in Linear Bandits, Soare, Lazaric, Munos, 2014
[CastroWillettNowak] Faster Rates in Regression Via Active Learning Castro, Willett, Nowak, 2005
[ChaudhuriMykland] Nonlinear Experiments: Optimal Design and Inference Based on Likelihood, P. Chaudhuri, Mykland, 1993
[ChaudhuriKakadeEtAl] Convergence Rates of Active Learning for Maximum Likelihood Estimation K. Chaudhuri, Kakade, Netrapalli, Sanghavi, 2015
[AbbasiyadkoriEtAl] Improved Algorithms for Linear Stochastic Bandits, Abbasi-Yadkori, Pál, Szepesvári, 2011
[RussoEtAl] A Tutorial on Thompson Sampling Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen
[RussoVanroy] Learning to Optimize Via Information-Directed Sampling, Daniel Russo, Benjamin Van Roy, 2014
[LiChuEtAl] A Contextual-Bandit Approach to Personalized News Article Recommendation Li, Chu, Langford, Schapire, 2010
[LangfordZhang] The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits, Langford, Zhang, 2008
[ConnEtAl] Introduction to derivative-free optimization, Conn, Scheinberg, Vicente, 2009 (free with UW credentials)
[FlaxmanKalaiMcmahan] Online convex optimization in the bandit setting: gradient descent without a gradient, Flaxman, Kalai, Mcmahan, 2004
[Nesterov] Random Gradient-Free Minimization of Convex Functions, Nesterov, 2011
[BubeckEldanLee] Kernel-based methods for bandit convex optimization Sébastien Bubeck, Ronen Eldan, Yin Tat Lee
[SrinivasEtAl] Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design Srinivas, Krause, Kakade, Seeger
[BubeckMunosEtAl] X-Armed Bandits Bubeck, Munos, Stoltz, Szepesvari 2011
[CaoKrishnamurthy] Disagreement-based combinatorial pure exploration: Efficient algorithms and an analysis with localization Cao, Krishnamurthy, 2017
[Dasgupta] Two Faces of Active Learning Sanjoy Dasgupta
[DasguptaHsuMonteleoni] A general agnostic active learning algorithm Dasgupta, Hsu, Monteleoni
[Nowak] The Geometry of Generalized Binary Search Nowak
[Dasgupta2] Analysis of a greedy active learning strategy Dasgupta
[GolovinKrause] Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization Golovin, Krause, 2010
[CastroNowak] Upper and Lower Error Bounds for Active Learning Castro, Nowak, 2006
[GittinsEtAl] Multi-armed Bandit Allocation Indices Gittins, Glazebrook, Weber, 2011
[Szepesvari] Algorithms for Reinforcement Learning Csaba Szepesvari
[SuttonBarto] Reinforcement Learning: An Introduction Sutton and Barto, 2018.
[KocsisSzepesvari] Bandit based Monte-Carlo Planning Kocsis, Szepesvari, 2006
[AbbasiyadkoriSzepesvari] Regret Bounds for the Adaptive Control of Linear Quadratic Systems Abbasi-Yadkori, Szepesvari, 2011
[DeanEtAl] On the Sample Complexity of the Linear Quadratic Regulator Dean, Mania, Matni, Recht, Tu, 2017

Other resources that may be useful:

[BoucheronEtAl] Concentration Inequalities: A Nonasymptotic Theory of Independence Stéphane Boucheron,‎ Gábor Lugosi,‎ Pascal Massart (not free)
[Duchi] Probability Bounds John Duchi
[Tsybakov] Introduction to Nonparametric Estimation Alexandre B. Tsybakov (free on campus network)
[BoucheronEtAl2] Theory of Classification: A Survey of Some Recent Advances Boucheron, Bousquet, Lugosi, 2005
[RasmussenWilliams] Gaussian Processes for Machine Learning Carl Edward Rasmussen and Chris Williams
[HastieEtAl] The Elements of Statistical Learning: Data Mining, Inference, and Prediction Trevor Hastie, Robert Tibshirani, Jerome Friedman
[EfronHastie] Computer Age Statistical Inference: Algorithms, Evidence and Data Science Bradley Efron, Trevor Hastie

Discussion Forum and Email Communication

There will be a Slack channel (first day of class). This is your first resource for questions. For private or confidential questions email the instructor.

Grading and Evaluation

Your grade will be based on scribing and potentially presenting a subset of a single lecture (e.g., a technical proof in the work following an overview by the instructor). You are expected to prepare for the lecture you are assigned to well before the lecture occurs by reading and understanding not only the assigned papers but supporting materials and concepts as well (i.e. if the paper uses a lemma from a different paper, you should understand that lemma and where it comes from). Scribing a lecture means summarizing the assigned papers and the lecture itself (with main theorems and proofs) as well as how this work fits into the context of the class so far and why it matters (e.g., linear bandits is a multi-armed bandit game with a given feature vector, a setting with applications to X, Y, Z). You are expected to put several hours into preparing these notes, a mere summary of what was presented in lecture without context or applications is unacceptable. You must come to the instructor's office hours (or by appointment) days preceding the lecture to discuss the plan for that lecture presentation and notes. Instructions will be sent out on how to submit your preferences over lectures -- the plan is to have each innermost bullet point be a single lecture.

Scribe notes should be prepared using the Latex template. The final notes should be turned in within a few days following lecture with the understanding that the majority of the notes should be completed before lecture.

Schedule

Lecture 1: Jan. 3

Welcome, logistics, overview of course topics
Reading: [Bubeck Ch. 1], [BubeckCesaBianchi Ch. 1]

Lecture 2: Jan. 5

Intro to online learning protocol
Exponential weights: optimization on the simplex

Reading: [Bubeck Chapter 1, 2.1-2.3]
Concepts: Convexity, Jensen's inequality, Hoeffding's inequality, exponential weights
Examples: Weather prediction, mixture of experts
Characteristics: full information, adversarial, regret-minimization, finite action space

Notes: Lecture notes

Lecture 3: Jan. 10

Stochastic Multi-armed bandits, regret minimization

Reading: [BubeckCesaBianchi Chapter 1, 2.1-2.2], [Duchi Sections 1-2] (or [BoucheronEtAl Ch. 2])
Techniques: Chernoff Bound, Hoeffding's inequality
Algorithms: UCB1, Thompson Sampling
Examples: Explore versus exploit different treatments for patients
Characteristics: bandit feedback, stochastic, regret-minimization, finite action space

Notes: Lecture notes

Lecture 4: Jan. 12

Stochastic Multi-armed bandits, pure exploration

Reading: [JamiesonNowak], [Simchowitz Sec. 4], [KarninKorenSomekh Sec. 4]
Techniques: Law of the Iterated Logarithm (LIL), top-k identification in fixed confidence and fixed budget settings
Algorithms: Successive Elimination, LUCB++, Successive Halving
Examples: Caption contest: find the 3 best captions using crowdsourced votes
Characteristics: bandit feedback, stochastic, pure-exploration, finite action space

Notes: Lecture notes

Lecture 5: Jan. 17

Non-stochastic Multi-armed bandits, regret minimization

Reading: [SzepesvariLattimore Adversarial Bandits]
Techniques: Importance Sampling
Algorithms: EXP3, EXP3.P
Examples: Planning your commute: train, bus, drive, or walk?
Characteristics: bandit feedback, non-stochastic, regret-minimization, finite action space

Notes: Lecture notes

Lecture 6: Jan. 19

Non-stochastic Multi-armed bandits, pure exploration

Reading: [LiEtAl]
Techniques: Infinite-armed bandits
Algorithms: Successive Halving, Hyperband
Examples: Non-convex optimization with random restarts, hyperparameter tuning
Characteristics: bandit feedback, non-stochastic, pure exploration, infinite action space

Notes: Lecture notes

Lecture 7: Jan. 24

Linear regression, linear sequential experimental design

Reading: [RaskuttiMahoney], Optimal Design, [SoareLazaricMunos]
Techniques: Leverage scores, linear optimal experimental design
Algorithms: Least squares regression, best-arm identification for linear bandits
Examples: Science, search problems
Characteristics: stochastic and adversarial, linear least squares, stochastic pure-exploration, finite action space with side information

Notes: Lecture notes

Lecture 8: Jan. 26

Nonlinear sequential experimental design

Reading: [ChaudhuriMykland], [ChaudhuriKakadeEtAl]
Techniques: Maximum likelihood estimation(MLE), Fisher information, Laplace approximation
Algorithms: Sequential MLE with feedback
Examples: Science, search problems, classification, regression
Characteristics: stochastic pure-exploration, (in)finite action space with side information

Notes: Lecture notes

Lecture 9: Jan. 31

Linear Bandits, Thompson Sampling

Reading: [AbbasiyadkoriEtAl]
Techniques: Confidence ellipsoids for linear predictors
Algorithms: Linear bandits
Examples: Path routing with edge weight feedback, recommendation systems
Characteristics: bandit feedback, stochastic, regret-minimization, (in)finite action space with side information

Notes: Lecture notes

Lecture 10: Feb. 2

Contextual Bandits

Reading: [LiChuEtAl], [BubeckCesaBianchi Ch. 4], [LangfordZhang]
Techniques: Confidence ellipsoids for linear predictors, Bayesian posterior sampling
Algorithms: EXP4, LinUCB, tau-Greedy
Examples: recommending articles to users
Characteristics: bandit feedback, adversarial, linear, exponential weights

Notes: Lecture notes

Lecture 11: Feb. 7

Convex optimization with bandit feedback

Reading: [FlaxmanKalaiMcmahan], [BubeckEldanLee]
Techniques: One-point gradient estimation
Algorithms: GP-UCB, Thompson Sampling
Examples: Recommendation systems over continuous spaces
Characteristics: bandit feedback, stochastic, regret-minimization, (in)finite action space with metric space

Notes: TBD

Lecture 12: Feb. 9

Bayesian Bandits

Reading: [RussoEtAl]
Techniques: Posterior inference, Bayesian posterior sampling
Algorithms: Thompson Sampling
Examples: Maximize click through rate with known dependencies
Characteristics: bandit feedback, stochastic, regret-minimization, (in)finite action space with metric space

Lecture 13: Feb. 14

Kernelized and Gaussian Process Bandits

Reading: [SrinivasEtAl], [RasmussenWilliams Ch.2]
Techniques: Gaussian priors, dependent arms
Algorithms: GP-UCB
Examples: Recommendation systems over continuous spaces
Characteristics: bandit feedback, stochastic, regret-minimization, (in)finite action space with metric space

Notes: Lecture notes

No class: Feb. 16
Lecture 14: Feb. 21

Binary classification, Streaming Active Learning

Reading: [Dasgupta], CAL Notes, [DasguptaHsuMonteleoni]
Techniques: Active learning for binary classificaiton in streaming setting.
Algorithms: CAL
Examples: Learning a spam filter
Characteristics: stochastic-sampling with deterministic outcomes, pure-exploration

Notes: Lecture notes

Lecture 15: Feb. 23

Binary classification, Pool-based Active Learning

Reading: [Dasgupta2], [GolovinKrause], [Nowak]
Techniques: Active learning for binary classificaiton in pool-based setting. Adaptive submodularity optimization. Combinatorial bandits.
Algorithms: Generalized binary search
Examples: Learning a classifier with pool of unlabeled examples (e.g., google images)
Characteristics: stochastic-sampling with deterministic outcomes, pure-exploration

Notes: Lecture notes

Lecture 16: Feb. 28

Pool-based Active Learning, Combinatorial bandits

Reading: [CaoKrishnamurthy]
Techniques: Reduction of binary classification to combinatorial bandits.
Algorithms: Action elimination
Examples: Top-K arm identification, Learning a classifier with pool of unlabeled examples (e.g., google images)
Characteristics: stochastic-sampling with deterministic outcomes, pure-exploration

Lecture 17: Mar. 2

Finite Markov Decision Processes (MDPs)

Reading: [Szepesvari Ch. 1, 2]
Techniques: Value, Q function learning by dynamic programming
Algorithms: Value iteration, Q-learning
Examples: Reinforcement learning
Characteristics: stochastic state dependent actions, stochastic state transitions, regret-minimization, finite action space

Notes: Lecture notes

Lecture 18: Mar. 7

Infinite Markov Decision Processes (MDPs)

Reading: [Szepesvari Ch. 4]
Techniques: Learning with rollouts
Algorithms: parameterized Q-learning, policy gradient
Examples: Reinforcement learning
Characteristics: stochastic state dependent actions, stochastic state transitions, regret-minimization, (in)finite action space

Lecture 19: Mar. 9

Monte Carlo Tree Search (MCTS)

Reading: [SuttonBarto Ch. 8], [KocsisSzepesvari]
Techniques: MDPs with large branching factors
Algorithms: UCT
Examples: Playing strategy games and Go
Characteristics: stochastic state dependent actions, stochastic state transitions, regret-minimization, countable state space, finite action space (per state)

Notes: Lecture notes

Lecture 20: Mar. 12 11:00 AM in EEB 045 (Note special location and date)

Linear Quadratic Regulator (LQR)

Reading: Boyd's notes on LQR, [AbbasiyadkoriSzepesvari]
Techniques: Automatic control in linear systems
Algorithms: LQR/LQG
Examples: Going to the moon and maintaining temperature
Characteristics: stochastic state dependent actions, stochastic state transitions, regret-minimization, continuous state and action space

Notes: Lecture notes

CSE 599, Winter 2018 Online and Adaptive Methods for Machine Learning