## Course Description

Weather forecasting, predicting stock market trends, and deciding
which ads to present on a web page are examples of sequential
prediction problems. Online learning is a powerful and popular way of
dealing with such problems. An online learning algorithm observes a
stream of examples and makes a prediction for each element in the
stream. The algorithm receives immediate feedback about each
prediction and uses this feedback to improve its accuracy on
subsequent predictions. In contrast to statistical machine learning,
online learning algorithms donâ€™t make stochastic assumptions about the
data they observe, and even handle situations where the data is
generated by a malicious adversary. There has been a recent surge of
scientific research on online learning algorithms, largely due to
their broad applicability to web-scale prediction problems.

This course, Online Learning (CSE599s), will provide a rigorous
introduction to state-of-the-art online learning algorithms with an
emphasis on algorithm design and theoretical analysis. Topics
include: background in convex analysis, regret minimization, online
optimization, expert problems, online learning with partial feedback
and explore/exploit tradeoffs (a.k.a. bandit problems) and a selection
of advanced topics.

### Planned Topics

After motivating and defining the online learning problem, we will present different families of online learning algorithms. All of the algorithms discussed in this course are both practical and come with theoretical worst-case guarantees.

**Follow the regularized leader** The follow the
regularized leader (FTRL) algorithm family is in some
ways the most fundamental algorithm in online
learning. We will introduce and analyze a simple
instance of the algorithm using a general-purpose and
very powerful lemma. Some necessary background from
convex analysis will also be introduced.

**Online gradient descent** Online gradient descent
algorithms are close relatives of stochastic gradient
descent. We will present a regret bound analysis for
linear functions, and show how that same bound applies
for free to general convex function. This immediately
gives us algorithms for logistic regression and linear
support vector machines, for example.

**Adaptive per-coordiante learning rates** This
simple modification to gradient descent can produce
dramatically better performance on large datasets. We
illustrate the problem with global learning rates via
one-dimensional examples, and then extend the analysis
from the previous lecture to provide tighter bounds
when adaptive per-coordinate learning rates are used.

**Follow the proximally-regularized leader** We
show that follow-the-regularized leader and online
gradient descent are closely related. However, their
behavior differs in some important respects. We show
how L1 regularization for producing sparse models
(e.g., the Lasso algorithm) can be implemented much
more effectively in the FTRL context.

**Alternative notions of regret** So far, we
have analyzed algorithms by comparing their
performance to a fixed best-in-hindsight comparator.
However, for some applications other notions of regret
are more appropriate, for example, when the "true"
model of the world is changing.

**Mirror descent** Mirror descent generalizes
online gradient descent, and allows us to develop
algorithms for many other problems. We consider the
best-experts problem, where on each round we must
choose an "expert" from a set whose advice we follow
for that round.

**Bandit algorithms** Algorithms for the k-armed
bandit problem extend the best-experts problem with
partial feedback. On each round you only find out
about the reward obtained by following the expert you
chose; you have no way of knowing directly whether
or not some other expert would have been better. As
an example, consider showing ads to a user on a
webpage --- we model the ads as "experts" and have no
way of knowing whether the user would have clicked on
an ad if we didn't choose to show it.

**High probability bounds** Most of the regret
bounds for experts and bandit algorithms to this point
only hold in expectation; the algorithms use
randomization, and if we get unlucky then we can do
worse than the bound suggests we should. However,
with some modification to the algorithms, we can
design algorithms that guarantee good performance with
high probability.

**Stochastic bandits** The bandit algorithms
introduced so far make no statistical assumption about
the rewards of the experts/actions. Can we get better
bounds if we know rewards follow some (unknown)
distributions?

**Contextual bandits** Basic bandit algorithms
must choose an action without the aid of any
additional context. Contextual bandit algorithms,
however, can leverage additional information about the
current state of the world each time a decision is
made. For example, when placing ads on a webpage,
perhaps we should make that decision differently
depending on the content of the webpage.

**Advanced topics** Advanced topics may include
an examination of connections between game theory and
online learning; the application of online learning
algorithms to batch machine learning problems; expert
and bandit problems with combinatorial structure; and
selective sampling and label efficiency.

### Prerequisites

This course assumes background in basic probability theory and linear algebra. Key mathematical concepts will be reviewed before they are used, but a certain level of mathematical maturity is expected. The course is theoretical and does not involve any programming.

### Grading

The course can be taken either for a grade or for credit/no-credit. Grades will be given based on homework assignments, lecture notes, and class participation.