Slepian's lemma and dual-Sudakov bounds

Sparsification and entropy numbers

Suppose that \(F(x) = f_1(x) + \cdots + f_m(x)\), \(\rho \in \R_+^m\) is a probability vector, and define our potential sparsifier as

\[\tilde{F}_{\nu}(x) \seteq \sum_{j=1}^M \frac{f_{\nu_j}(x)}{M \rho_{\nu_j}}\,.\]

Define the related distance

\[d_{\nu}(x,y) \seteq \left(\sum_{j=1}^M \left(\frac{f_{\nu_j}(x)-f_{\nu_j}(y)}{M \rho_{\nu_j}}\right)^2 \right)^{1/2}.\]

We have seen that if we can bound

\[\begin{equation}\label{eq:delta-bound} \E_{\e} \max_{F(x) \leq 1} \sum_{j=1}^m \e_j \frac{f_{\nu_j}(x)}{M \rho_{\nu_j}} \leq \delta \left(\max_{F(x) \leq 1} \tilde{F}_{\nu}(x)\right)^{1/2}\quad \forall \nu \in [m]^M\,, \end{equation}\]

where \(\e_1,\ldots,\e_M \in \{-1,1\}\) are uniformly random signs, then

\[\begin{equation}\label{eq:Fapprox} \E_{\nu} \max_{F(x) \leq 1} \left|F(x)-\tilde{F}_{\nu}(x)\right| \lesssim \delta\,, \end{equation}\]

where \(\nu_1,\ldots,\nu_M\) are indicies sampled i.i.d. from \(\rho\).

Finally, we have seen, using Dudley’s entropy bound, that if \(B_F \seteq \{ x \in \R^n : F(x) \leq 1 \}\), then

\[\begin{equation}\label{eq:dudley} \E_{\e} \max_{F(x) \leq 1} \sum_{j=1}^m \e_j \frac{f_{\nu_j}(x)}{M \rho_{\nu_j}} \lesssim \sum_{h \geq 0} 2^{h/2} e_h(B_F, d_{\nu})\,. \end{equation}\]

\(\ell_2\) regression

Recall the setting for \(\ell_2\) regression: \(a_1,\ldots,a_m \in \R^n\) and \(f_i(x) \seteq \abs{\langle a_i,x\rangle}^2\).

The covering lemma

Before proving the lemma, it helps to consider the more basic problem of covering the Euclidean ball \(B_2^n\) by translates of \(\e B_{\infty}^n\), i.e., by translates of small cubes.

Suppose \(\|x\|_2^2 = x_1^2 + x_2^2 + \cdots + x_n^2 = 1\). Since we only care about approximation up to \(\e\) in the \(\ell_{\infty}\) distance, we could discretize this vector to lie in, say, \(\e \mathbb{Z}^n \cap B_2^n\).

The most basic kind of vector we need to cover is of the form \((0, \pm \e, 0, 0, \pm \e, 0, \pm \e, 0, \ldots, 0)\). Because \(\|x\|_2^2 = 1\), there are only \(n^{O(1/\e^2)}\) choices for such a vector. But we also need to handle vectors of the form \((0, \pm 2\e, 0, 0, \pm \e, 0, \pm 2\e, 0, \ldots, 0)\), and so on.

It is not hard to convince one’s self that there are asymptotically fewer vectors of this form. Ineed, if some entry is \(2\e\) then there are \(n\) choices for where it goes, but there are \(n(n-1)/2\) choics for where two copies of \(\e\) go. Thus the total number of centers one needs is only \(n^{O(1/\e^2)}\).

In other words,

\[\left(\log \mathcal{N}(B_2^n, \|\cdot\|_{\infty}, \e)\right)^{1/2} \lesssim \frac{1}{\e} \sqrt{\log n}\,.\]

Now suppose we wanted to cover \(B_2^n\) instead with cubes of different side lengths, or with parallelpipeds (where the sides are no longer perpendicular), etc. There is a beautiful approach that gives surprisingly good bounds for cover \(B_2^n\) by translations of an arbitrary symmetric convex body. (I have heard it credited to Talagrand, or to Pajor and Talagrand.)