#LyX file created by tex2lyx 2.0.4
\lyxformat 413
\begin_document
\begin_header
\textclass article
\begin_preamble

\usepackage{cse599s14sp}\usepackage{url}\usepackage{fullpage}\usepackage{amsfonts}\usepackage{amsthm}\usepackage{algorithmic}\usepackage{enumerate}% Please use these commands where appropriate:
\newcommand{\Regret}{\operatorname{Regret}}
\newcommand{\Loss}{\operatorname{loss}}
\newcommand{\grad}{\triangledown}

\begin{lecture}{8}{Sample file for CSE599s}{Saghar Hosseini}{Ofer Dekel}{04/24/2014}

\section{Recap: Follow the Regularized Leader with Entropic Regularizer in the Probability Simplex }
Recall the problem of learning with expert advice. Let $d$ be the number of experts. At each round $t$, the player can choose one expert $I_t$ and observe the loss of all experts if they would have been chosen. This is a full information feedback problem  and the following algorithm was presented in the previous lecture to minimize the expected regret.
\begin{center}
\fbox{
\parbox{5in}{
  \begin{algorithmic}
    \FOR{$t=1, 2, \dots, T$}\\
    \STATE $p_t=\arg \min_{p\in \mathbb{R}^d} \{pl_{1:t-1}+\frac{1}{\eta}\sum_{i=1}^d(p_i \log p_i+\log d)  +I_{\Delta d}(p)\}$ 
    \STATE Draw $I_t \sim p_t$, and incur loss $l_{t,I_t}$
    \STATE Observe $l_t \in [0,d]^d$
    \ENDFOR
  \end{algorithmic}
}}
\end{center}
Note that EXP3 algorithm which uses the Exponentiated Gradient (EG) method has been introduce to approach this problem:
\begin{center}
\fbox{
\parbox{5in}{
  \begin{algorithmic}
   \STATE Initialize $w_1=(1, 1, \dots, 1)$
    \FOR{$t=1, 2, \dots, T$}\\
    \STATE Define $p_t=\frac{w_t}{||w_t||}$\ 
    \STATE Draw $I_t \sim p_t$, and incur loss $l_{t,I_t}$
    \STATE Observe $l_t \in [0,d]^d$
    \FOR{$i=1, 2, \dots, d$}
    \STATE Update $w_{t+1,i}=w_{t,i} e^{-\eta l_{t,i}}=e^{-\eta \sum_{s=1}^{t-1}l_{s,i}}$
    \ENDFOR
    \ENDFOR
  \end{algorithmic}
}}
\end{center}

In some problems when the player choses one arm/experts he/she does not observe the whole loss vector $l_t \in [0,d]^d$ . The player can only observe the loss associated with the expert that was chosen, i.e. $l_{t,I_t}$, and this is called a Bandits problem. In the next section, a method is presented to relate the "multi-arm bandits problem" to the "Experts" problem. 
\section{A (general) Reduction from "Bandits" to "Experts"}
\subsection*{Blocking}
Choose a block size $B$ assuming that $B$ divides $T$.
\begin{itemize}
\item Please write your notes in complete sentences, including correct
  punctuation for called out equations.  For example, the definition
  of regret is
\[
  \Regret = \sum_{t=1}^T f_t(w_t) - \min_{u \in W} \sum_{t=1}^T f_t(u).
\]
Notice the period at the end of the above sentence!

\item Use \verb|\text| for text inside of equations, or better define
  a newcommand using \verb|\operatorname|.  Compare
  \[ Regret \qquad \text{to} \qquad \Regret.
 \]

\item Use the macros defined in this sample file, e.g.,
  \verb|\Regret|, \verb|\argmin|, \verb|\R|, etc.

\item Try Googling ``how to write math'' for more good advice; this is good place to start:\\
  \url{http://erickson.sites.truman.edu/files/2012/04/guide1.pdf}.
\end{itemize}


\section{Notation}
We will typically use the following notation.  Don't worry if you don't know what these all mean yet.

\vspace{0.1in}
\noindent
\begin{tabular}{|l|l|}
\hline
\textbf{symbol} & \textbf{meaning} \\
\hline
$t \in \{1, \dots, T\}$  & There are $T$ total rounds, and $t$ is the index of the current round. \\
$w_t \in \R^n$ & The feasible point selected by the algorithm on round $t$. \\
$n$ & The dimension of the feasible set. \\
$w_{t,i} \in \R$ & The $i$th coordinate of $w_t$, with $i \in \{1, \dots, n\}$. \\
$\mathcal{W} \subseteq \R^n$ & A convex set of feasible points, from which $w_t$ is chosen. \\
$f_t: \mathcal{W} \rightarrow \R$ & A convex loss function selected by the adversary on round $t$. \\
$g_t \in \R^n$  & The gradient of the current loss function at $w_t$, so $g_t = \grad f_t(w_t)$. \\
$R: \mathcal{W} \rightarrow \R$ & A strongly convex regularization function. \\
$(x_t, y_t)$  & Feature vector $x$ (usually in $\R^n$), and label $y \in \R$.\\
$h \in \mathcal{H}$ & Hypothesis $h$ from the set of possible hypotheses $\mathcal{H}$.\\
$\hat{y}_t$ & Predicted label for $x_t$, for example $\hat{y}_t = h(x_t)$.\\
\hline
\end{tabular}


\section{Online learning is fun}
\begin{theorem}
My algorithm works.
\end{theorem}

\begin{proof}
I have proof.
\end{proof}

\begin{thebibliography}{10}
\bibitem{CesaBianchi06}
N. Cesa-Bianchi and G. Lugosi, ``Prediction, Learning, and Games'', \emph{Cambridge University Press}, 2006.
\end{thebibliography}

\end{lecture}
\theend

\end_preamble
\use_default_options false
\language english
\language_package none
\inputencoding auto
\fontencoding default
\font_roman default
\font_sans default
\font_typewriter default
\font_default_family default
\use_non_tex_fonts false
\font_sc false
\font_osf false
\font_sf_scale 100
\font_tt_scale 100
\graphics default
\paperfontsize default
\spacing single
\use_hyperref 0
\papersize default
\use_geometry false
\use_amsmath 2
\use_esint 1
\use_mhchem 0
\use_mathdots 0
\cite_engine basic
\use_bibtopic false
\use_indices false
\paperorientation portrait
\suppress_date false
\use_refstyle 0
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\paragraph_indentation default
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
\end_header

\begin_body

\begin_layout Standard


\end_layout

\end_body
\end_document