Symmetrization and subgaussian processes

Symmetrization, take I

Recall that we have a function \(F : \R^n \to \R\) written as \(F(x) = f_1(x) + \cdots + f_m(x)\) and we want to sparisfy it by sampling the terms in proportion to a probability vector \(\rho \in \R_+^m\) (the “importance scores”).

One way to analyze this is to first make sure all the probabilities are uniformly small, say \(\rho_1,\ldots,\rho_m \leq 2/m\). Since we are trying to sparsify down to a number of terms that is independent of \(m\), we can always achieve this by simply splitting a term

\[f_i(x) \to f_i(x)/s + f_i(x)/s + \cdots + f_i(x)/s\,,\]

where there are \(s\) summands.

Then one can sparsify in phases by deleting every term independently with probability \(1/2\). This experiment can be modeled with \(\{0,1\}\) random variables in the natural way:

\[\E_{\e} \max_{x \in \Omega} \left|F(x) - \sum_{j=1}^m (1+\e_j) f_j(x)\right| = \E_{\e} \max_{x \in \Omega} \left|\sum_{j=1}^m \e_j f_j(x)\right|.\]

where \(\e_1,\ldots\e_m\) are i.i.d. uniform random $\pm 1$ variables, and \(\Omega \subseteq \R^n\) is our domain of interest.

It turns out that it is possible to do something similar and reduce the analysis of one-shot sampling to the analysis of randomly signed sums of the functions. This makes for both a simpler algorithm, and a more robust method of analysis that will be very useful later.

Symmetrization, take II

Let’s consider a more general situation. Consider a family of independent random functions \(\mathbf{g}_1,\ldots,\mathbf{g}_M : \R^n \to \R\) that are sampled from some distribution. (Recall that we are thinking about each one as being chosen from a finite family of functions and, in our case, it even holds that they are i.i.d.).

Define \(\mathbf{G}(x) \seteq \mathbf{g}_1(x) + \cdots + \mathbf{g}_m(x)\) and \(F(x) \seteq \E[\mathbf{G}(x)]\). We are interested in the quantity

\[\begin{equation}\label{eq:Esup} \mathcal{S} \seteq \E \max_{x \in \Omega} \left|F(x)-\mathbf{G}(x)\right| = \E \max_{x \in \Omega} \left|F(x) - \left(\mathbf{g}_1(x)+\cdots+\mathbf{g}_M(x)\right)\right|\,, \end{equation}\]

where \(\Omega \subseteq \R^n\).

For our particular case of interest, let us take \(\Omega \seteq \{ x \in \R^n : F(x) \leq 1 \}\).

This completes the symmetrization argument, allowing us to focus on proving bounds like \eqref{eq:rad}.

The advantage here lies in the the fact that \(\left \{ \e_1 g_1(x) + \e_2 g_2(x) + \cdots + \e_M g_M(x) : x \in \R^n \right\}\) is an example of a subgaussian process, and bounding the expected maximum of a subgaussian process has a long and rich history, along with a correspondingly powerful framework.

The (mild) disadvantage is that we require the inequality \eqref{eq:rad} to hold for every choice of functions \(g_1,\ldots,g_M\) in the support of our distribution.

(Note: The confusing terminology “process” is a historical artifact. Originally, one was indeed interested in \(\e_1 g_1(x)\), \(\e_1 g_1(x) + \e_2 g_2(x)\), etc. as a sequence “evolving in time.” If one switches from discrete to continuous time, e.g., a process like Brownian motion, then control of the expected maximum is closely related to almost sure continuity of the sample paths.)

Subgaussian processes

Consider a collection of random variables \(\{ X_t : t \in T \}\) where the index set \(T\) is equipped with a distance \(d\). The family is called subgaussian (with respect to \(d\)) if there if there is a constant \(c > 0\) such that

\[\begin{equation}\label{eq:subgaussian} \P\left[|X_s-X_t| > \lambda\,d(s,t)\right] \leq e^{- c\lambda^2}\,,\quad \forall \lambda > 0, s,t \in T\,. \end{equation}\]

The quantity we care about after our symmetrization argument is

\[\begin{equation}\label{eq:symout} \E \max_{x \in \Omega} \left(\e_1 g_1(x) + \cdots + \e_M g_M(x)\right), \end{equation}\]

and as we have just seen this is the expected maximum of a centered subgaussian process. (Centered means that \(\E X_t = 0\) for every \(t \in T\).) There is a rich history and theory for bounding such expected maxima, with M. Talagrand as the primary architect (see his comprehensive book on the topic).