#LyX file created by tex2lyx 2.0.4 \lyxformat 413 \begin_document \begin_header \textclass article \begin_preamble \usepackage{cse599s14sp}\usepackage{url}\usepackage{fullpage}\usepackage{amsfonts}\usepackage{amsthm}\usepackage{algorithmic}\usepackage{enumerate}% Please use these commands where appropriate: \newcommand{\Regret}{\operatorname{Regret}} \newcommand{\Loss}{\operatorname{loss}} \newcommand{\grad}{\triangledown} \begin{lecture}{8}{Sample file for CSE599s}{Saghar Hosseini}{Ofer Dekel}{04/24/2014} \section{Recap: Follow the Regularized Leader with Entropic Regularizer in the Probability Simplex } Recall the problem of learning with expert advice. Let $d$ be the number of experts. At each round $t$, the player can choose one expert $I_t$ and observe the loss of all experts if they would have been chosen. This is a full information feedback problem and the following algorithm was presented in the previous lecture to minimize the expected regret. \begin{center} \fbox{ \parbox{5in}{ \begin{algorithmic} \FOR{$t=1, 2, \dots, T$}\\ \STATE $p_t=\arg \min_{p\in \mathbb{R}^d} \{pl_{1:t-1}+\frac{1}{\eta}\sum_{i=1}^d(p_i \log p_i+\log d) +I_{\Delta d}(p)\}$ \STATE Draw $I_t \sim p_t$, and incur loss $l_{t,I_t}$ \STATE Observe $l_t \in [0,d]^d$ \ENDFOR \end{algorithmic} }} \end{center} Note that EXP3 algorithm which uses the Exponentiated Gradient (EG) method has been introduce to approach this problem: \begin{center} \fbox{ \parbox{5in}{ \begin{algorithmic} \STATE Initialize $w_1=(1, 1, \dots, 1)$ \FOR{$t=1, 2, \dots, T$}\\ \STATE Define $p_t=\frac{w_t}{||w_t||}$\ \STATE Draw $I_t \sim p_t$, and incur loss $l_{t,I_t}$ \STATE Observe $l_t \in [0,d]^d$ \FOR{$i=1, 2, \dots, d$} \STATE Update $w_{t+1,i}=w_{t,i} e^{-\eta l_{t,i}}=e^{-\eta \sum_{s=1}^{t-1}l_{s,i}}$ \ENDFOR \ENDFOR \end{algorithmic} }} \end{center} In some problems when the player choses one arm/experts he/she does not observe the whole loss vector $l_t \in [0,d]^d$ . The player can only observe the loss associated with the expert that was chosen, i.e. $l_{t,I_t}$, and this is called a Bandits problem. In the next section, a method is presented to relate the "multi-arm bandits problem" to the "Experts" problem. \section{A (general) Reduction from "Bandits" to "Experts"} \subsection*{Blocking} Choose a block size $B$ assuming that $B$ divides $T$. \begin{itemize} \item Please write your notes in complete sentences, including correct punctuation for called out equations. For example, the definition of regret is \[ \Regret = \sum_{t=1}^T f_t(w_t) - \min_{u \in W} \sum_{t=1}^T f_t(u). \] Notice the period at the end of the above sentence! \item Use \verb|\text| for text inside of equations, or better define a newcommand using \verb|\operatorname|. Compare \[ Regret \qquad \text{to} \qquad \Regret. \] \item Use the macros defined in this sample file, e.g., \verb|\Regret|, \verb|\argmin|, \verb|\R|, etc. \item Try Googling ``how to write math'' for more good advice; this is good place to start:\\ \url{http://erickson.sites.truman.edu/files/2012/04/guide1.pdf}. \end{itemize} \section{Notation} We will typically use the following notation. Don't worry if you don't know what these all mean yet. \vspace{0.1in} \noindent \begin{tabular}{|l|l|} \hline \textbf{symbol} & \textbf{meaning} \\ \hline $t \in \{1, \dots, T\}$ & There are $T$ total rounds, and $t$ is the index of the current round. \\ $w_t \in \R^n$ & The feasible point selected by the algorithm on round $t$. \\ $n$ & The dimension of the feasible set. \\ $w_{t,i} \in \R$ & The $i$th coordinate of $w_t$, with $i \in \{1, \dots, n\}$. \\ $\mathcal{W} \subseteq \R^n$ & A convex set of feasible points, from which $w_t$ is chosen. \\ $f_t: \mathcal{W} \rightarrow \R$ & A convex loss function selected by the adversary on round $t$. \\ $g_t \in \R^n$ & The gradient of the current loss function at $w_t$, so $g_t = \grad f_t(w_t)$. \\ $R: \mathcal{W} \rightarrow \R$ & A strongly convex regularization function. \\ $(x_t, y_t)$ & Feature vector $x$ (usually in $\R^n$), and label $y \in \R$.\\ $h \in \mathcal{H}$ & Hypothesis $h$ from the set of possible hypotheses $\mathcal{H}$.\\ $\hat{y}_t$ & Predicted label for $x_t$, for example $\hat{y}_t = h(x_t)$.\\ \hline \end{tabular} \section{Online learning is fun} \begin{theorem} My algorithm works. \end{theorem} \begin{proof} I have proof. \end{proof} \begin{thebibliography}{10} \bibitem{CesaBianchi06} N. Cesa-Bianchi and G. Lugosi, ``Prediction, Learning, and Games'', \emph{Cambridge University Press}, 2006. \end{thebibliography} \end{lecture} \theend \end_preamble \use_default_options false \language english \language_package none \inputencoding auto \fontencoding default \font_roman default \font_sans default \font_typewriter default \font_default_family default \use_non_tex_fonts false \font_sc false \font_osf false \font_sf_scale 100 \font_tt_scale 100 \graphics default \paperfontsize default \spacing single \use_hyperref 0 \papersize default \use_geometry false \use_amsmath 2 \use_esint 1 \use_mhchem 0 \use_mathdots 0 \cite_engine basic \use_bibtopic false \use_indices false \paperorientation portrait \suppress_date false \use_refstyle 0 \secnumdepth 3 \tocdepth 3 \paragraph_separation indent \paragraph_indentation default \quotes_language english \papercolumns 1 \papersides 1 \paperpagestyle default \tracking_changes false \output_changes false \html_math_output 0 \html_css_as_file 0 \html_be_strict false \end_header \begin_body \begin_layout Standard \end_layout \end_body \end_document