Entropy-number convention.

CSE 525: Randomized Algorithms Spring 2026 Lecture 19: Chaining for Norms Lecturer: Shayan Oveis Gharan 06/04/25

Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

The content of these notes are based on https://homes.cs.washington.edu/~jrl/cse599wi23/notes/lec4.html.

Entropy-number convention.

For a metric space $(S,\rho)$ , write $e_{h}(S,\rho)$ for the smallest radius $r$ such that $S$ is coverable by at most $2^{2^{h}}$ balls of radius $r$ in metric $\rho$ .

19.1 Norms and the main estimate

Definition 19.1 (Norms and seminorms).

A map $N:\mathbb{R}^{n}\to\mathbb{R}_{+}$ is a norm when, for all $x,y\in\mathbb{R}^{n}$ and $\lambda\in\mathbb{R}$ ,

1.

$N(\lambda x)=\left\lvert\lambda\right\rvert N(x)$ ,
2.

$N(x+y)\leq N(x)+N(y)$ ,
3.

$N(x)=0$ if and only if $x=0$ .

When only the first two properties are used, $N$ is a seminorm. The arguments below use the word “norm” in this broad sense.

Let $N_{1},\dots,N_{m}$ be norms on $\mathbb{R}^{n}$ , and let

T\subseteq B_{2}^{n},\qquad B_{2}^{n}\coloneqq\{x\in\mathbb{R}^{n}:\left\lVert x% \right\rVert_{2}\leq 1\}.

For a standard Gaussian $g\sim N(0,I_{n})$ , define

\kappa\coloneqq\mathbb{E}\max_{k=1,\dots,m}N_{k}(g).

Theorem 19.2.

If $\epsilon_{1},\dots,\epsilon_{m}$ are independent random signs, then

\mathbb{E}\max_{x\in T}\sum_{k=1}^{m}\epsilon_{k}N_{k}(x)^{2}\lesssim\kappa% \log(n)\max_{x\in T}\sqrt{\sum_{k=1}^{m}N_{k}(x)^{2}}.

(19.1)

19.1.1 Example: sums of random matrices

Let

A=\sum_{k=1}^{m}\epsilon_{k}A_{k}^{T}A_{k},

with each $A_{k}^{T}A_{k}$ positive semidefinite. Then

	$\displaystyle\left\lVert A\right\rVert_{\mathrm{op}}$	$\displaystyle=\max_{\left\lVert x\right\rVert_{2}\leq 1}\langle x,Ax\rangle$
		$\displaystyle=\max_{\left\lVert x\right\rVert_{2}\leq 1}\sum_{k=1}^{m}\epsilon% _{k}\langle x,A_{k}^{T}A_{k}x\rangle$
		$\displaystyle=\max_{\left\lVert x\right\rVert_{2}\leq 1}\sum_{k=1}^{m}\epsilon% _{k}\left\lVert A_{k}x\right\rVert_{2}^{2}.$

This is the preceding setting with $T=B_{2}^{n}$ and $N_{k}(x)=\left\lVert A_{k}x\right\rVert_{2}$ .

Translator note.

The source text appears to phrase the final identification in squared form. The normalization above is the one for which $\sum_{k}\epsilon_{k}N_{k}(x)^{2}=\sum_{k}\epsilon_{k}\left\lVert A_{k}x\right% \rVert_{2}^{2}$ .

19.2 Dudley’s inequality and metric reduction

The process

\left\{\sum_{k=1}^{m}\epsilon_{k}N_{k}(x)^{2}:x\in T\right\}

is subgaussian with respect to

d(x,y)\coloneqq\left(\sum_{k=1}^{m}\bigl{(}N_{k}(x)^{2}-N_{k}(y)^{2}\bigr{)}^{% 2}\right)^{1/2}.

Dudley’s entropy inequality therefore gives

\mathbb{E}\max_{x\in T}\sum_{k=1}^{m}\epsilon_{k}N_{k}(x)^{2}\lesssim\sum_{h% \geq 0}2^{h/2}e_{h}(T,d).

(19.2)

Both sides of (19.1) are homogeneous of degree two in the family $(N_{k})_{k=1}^{m}$ . Thus one may rescale and assume

\max_{x\in T}\sqrt{\sum_{k=1}^{m}N_{k}(x)^{2}}=1.

(19.3)

Define

\left\lVert x\right\rVert_{N}\coloneqq\max_{k=1,\dots,m}N_{k}(x).

For $x,y\in T$ , use $a^{2}-b^{2}=(a-b)(a+b)$ to obtain

	$\displaystyle d(x,y)$	$\displaystyle=\left(\sum_{k=1}^{m}\bigl{(}N_{k}(x)-N_{k}(y)\bigr{)}^{2}\bigl{(% }N_{k}(x)+N_{k}(y)\bigr{)}^{2}\right)^{1/2}$
		$\displaystyle\leq\left(\sum_{k=1}^{m}(N_{k}(x-y))^{2}(N_{k}(x)+N_{k}(y))^{2}% \right)^{1/2}$
		$\displaystyle\leq\left(\sum_{k=1}^{m}\max_{k}N_{k}(x-y)^{2}\cdot(N_{k}(x)+N_{k% }(y))^{2}\right)^{1/2}$
		$\displaystyle=\left\lVert x-y\right\rVert_{N}\left(\sum_{k=1}^{m}\bigl{(}N_{k}% (x)+N_{k}(y)\bigr{)}^{2}\right)^{1/2}$
		$\displaystyle\leq\left\lVert x-y\right\rVert_{N}\cdot\left(\sum_{k=1}^{m}2N_{k% }(x)^{2}+2N_{k}(y)^{2}\right)^{1/2}$
		$\displaystyle=2\left\lVert x-y\right\rVert_{N}.$

The first inequality uses $\left\lvert N_{k}(x)-N_{k}(y)\right\rvert\leq N_{k}(x-y)$ , and the last equality follows from (19.3). Consequently, $e_{h}(T,d)\leq 2e_{h}(T,\left\lVert\cdot\right\rVert_{N})$ , and (19.2) implies

\mathbb{E}\max_{x\in T}\sum_{k=1}^{m}\epsilon_{k}N_{k}(x)^{2}\lesssim\sum_{h% \geq 0}2^{h/2}e_{h}(T,\left\lVert\cdot\right\rVert_{N}).

(19.4)

We now split the right-hand side into the ranges $h\leq 4\log n$ and $h>4\log n$ .

19.3 The large-entropy tail

Let

B_{N}\coloneqq\{x\in\mathbb{R}^{n}:\left\lVert x\right\rVert_{N}\leq 1\}.

For $x\in T$ , the normalization (19.3) gives

\left\lVert x\right\rVert_{N}\leq\sqrt{\sum_{k=1}^{m}N_{k}(x)^{2}}\leq 1,

so $T\subseteq B_{N}$ , hence $e_{h}(T,\left\lVert\cdot\right\rVert_{N})\leq e_{h}(B_{N},\left\lVert\cdot% \right\rVert_{N})$ .

Claim 19.3.

For any norm on $\mathbb{R}^{n}$ , and any $h\geq 1$ ,

e_{h}(B_{N},\left\lVert\cdot\right\rVert_{N})\leq 4\cdot 2^{-2^{h}/n}.

Proof.

Fix $\delta\in(0,1)$ , and choose a maximal collection $x_{1},\dots,x_{s}\in B_{N}$ with pairwise distances at least $2\delta$ in $\left\lVert\cdot\right\rVert_{N}$ . Maximality gives the cover

B_{N}\subseteq\bigcup_{j=1}^{s}(x_{j}+2\delta B_{N}).

The sets $x_{j}+\delta B_{N}$ are pairwise disjoint and contained in $2B_{N}$ , so

\operatorname{vol}_{n}(2B_{N})\geq s\,\operatorname{vol}_{n}(\delta B_{N})=s\,% (\delta/2)^{n}\operatorname{vol}_{n}(2B_{N}).

Therefore $s\leq(2/\delta)^{n}$ . Taking $\delta=2\cdot 2^{-2^{h}/n}$ yields $s\leq 2^{2^{h}}$ and gives a cover of $B_{N}$ by at most $2^{2^{h}}$ balls of radius $2\delta=4\cdot 2^{-2^{h}/n}$ . ∎

Using 19.3, the large- $h$ part of Equation 19.4 obeys

\sum_{h>4\log n}2^{h/2}e_{h}(T,\left\lVert\cdot\right\rVert_{N})\leq 4\sum_{h>% 4\log n}2^{h/2}2^{-2^{h}/n}\leq O(1).

Thus

\mathbb{E}\max_{x\in T}\sum_{k=1}^{m}\epsilon_{k}N_{k}(x)^{2}\lesssim O(1)+% \sum_{0\leq h\leq 4\log n}2^{h/2}e_{h}(T,\left\lVert\cdot\right\rVert_{N}).

19.4 The relevant entropy range and dual Sudakov

Since $T\subseteq B_{2}^{n}$ ,

e_{h}(T,\left\lVert\cdot\right\rVert_{N})\leq e_{h}(B_{2}^{n},\left\lVert\cdot% \right\rVert_{N}).

The required ingredient is the following dual Sudakov bound.

Lemma 19.4 (Dual Sudakov).

For any norm $\left\lVert\cdot\right\rVert$ on $\mathbb{R}^{n}$ and every $h\geq 0$ ,

e_{h}(B_{2}^{n},\left\lVert\cdot\right\rVert)\lesssim 2^{-h/2}\mathbb{E}\left% \lVert g\right\rVert,

where $g\sim N(0,I_{n})$ .

Applying 19.4 with $\left\lVert\cdot\right\rVert=\left\lVert\cdot\right\rVert_{N}$ gives

\mathbb{E}\left\lVert g\right\rVert_{N}=\mathbb{E}\max_{k=1,\dots,m}N_{k}(g)=\kappa.

Therefore

\sum_{0\leq h\leq 4\log n}2^{h/2}e_{h}(T,\left\lVert\cdot\right\rVert_{N})% \lesssim\kappa\log n.

Under the normalization (19.3), this proves (19.1); undoing the rescaling gives the stated form.

19.5 Gaussian shift lemma

Lemma 19.5 (Gaussian shift).

Let $K\subseteq\mathbb{R}^{n}$ be symmetric and convex, and let $\gamma_{n}$ denote standard Gaussian measure on $\mathbb{R}^{n}$ . For every $x\in\mathbb{R}^{n}$ ,

\gamma_{n}(K+x)\geq\exp\left(-\frac{\left\lVert x\right\rVert_{2}^{2}}{2}% \right)\gamma_{n}(K).

Proof.

Using symmetry of $K$ and writing $\sigma$ for a uniform random sign,

	$\displaystyle\gamma_{n}(K+x)$	$\displaystyle=(2\pi)^{-n/2}\int_{K}\exp\left(-\frac{\left\lVert x+z\right% \rVert_{2}^{2}}{2}\right)\,dz$
		$\displaystyle=(2\pi)^{-n/2}\int_{K}\mathbb{E}_{\sigma\in\{-1,1\}}\exp\left(-% \frac{\left\lVert\sigma x+z\right\rVert_{2}^{2}}{2}\right)\,dz.$

Since

\mathbb{E}_{\sigma}\left\lVert\sigma x+z\right\rVert_{2}^{2}=\left\lVert x% \right\rVert_{2}^{2}+\left\lVert z\right\rVert_{2}^{2},

Jensen’s inequality yields

	$\displaystyle\gamma_{n}(K+x)$	$\displaystyle\geq(2\pi)^{-n/2}\int_{K}\exp\left(-\frac{\mathbb{E}_{\sigma}% \left\lVert\sigma x+z\right\rVert_{2}^{2}}{2}\right)\,dz$
		$\displaystyle=(2\pi)^{-n/2}\int_{K}\exp\left(-\frac{\left\lVert x\right\rVert_% {2}^{2}+\left\lVert z\right\rVert_{2}^{2}}{2}\right)\,dz$
		$\displaystyle=\exp\left(-\frac{\left\lVert x\right\rVert_{2}^{2}}{2}\right)% \gamma_{n}(K).$

∎

Translator note.

The displayed conclusion above matches the Gaussian-shift bound used in the proof of dual Sudakov; constants are immaterial for the subsequent $\lesssim$ estimate.

19.6 Proof of the dual Sudakov lemma

Let

\mathcal{B}\coloneqq\{x\in\mathbb{R}^{n}:\left\lVert x\right\rVert\leq 1\}

be the unit ball of the norm $\left\lVert\cdot\right\rVert$ . Choose $x_{1},\dots,x_{s}\in B_{2}^{n}$ maximally so that the translated sets $x_{j}+\delta\mathcal{B}$ are pairwise disjoint. Then

B_{2}^{n}\subseteq\bigcup_{j=1}^{s}(x_{j}+2\delta\mathcal{B}),

(19.5)

so $B_{2}^{n}$ is covered by $s$ balls of radius $2\delta$ in the norm $\left\lVert\cdot\right\rVert$ .

For any $\lambda>0$ , the scaled sets $\lambda(x_{j}+\delta\mathcal{B})$ are pairwise disjoint. Therefore

	$\displaystyle 1$	$\displaystyle\geq\gamma_{n}\left(\bigcup_{j=1}^{s}\lambda(x_{j}+\delta\mathcal% {B})\right)$
		$\displaystyle=\sum_{j=1}^{s}\gamma_{n}\bigl{(}\lambda x_{j}+\lambda\delta% \mathcal{B}\bigr{)}$
		$\displaystyle\geq\sum_{j=1}^{s}\exp\left(-\frac{\lambda^{2}\left\lVert x_{j}% \right\rVert_{2}^{2}}{2}\right)\gamma_{n}(\lambda\delta\mathcal{B})$
		$\displaystyle\geq s\exp\left(-\frac{\lambda^{2}}{2}\right)\gamma_{n}(\lambda% \delta\mathcal{B}),$

where 19.5 is used in the third line and $x_{j}\in B_{2}^{n}$ in the final line.

Choose

\lambda\coloneqq\frac{2}{\delta}\mathbb{E}\left\lVert g\right\rVert.

Then, by Markov’s inequality,

\gamma_{n}(\lambda\delta\mathcal{B})=\mathbb{P}\bigl{(}\left\lVert g\right% \rVert\leq\lambda\delta\bigr{)}=\mathbb{P}\bigl{(}\left\lVert g\right\rVert% \leq 2\mathbb{E}\left\lVert g\right\rVert\bigr{)}\geq\frac{1}{2}.

Combining the previous inequalities gives

1\geq\frac{s}{2}\exp\left(-\frac{1}{2}\left(\frac{2\mathbb{E}\left\lVert g% \right\rVert}{\delta}\right)^{2}\right).

Equivalently, up to universal constants,

\delta\lesssim\frac{\mathbb{E}\left\lVert g\right\rVert}{\sqrt{\log(s/2)}}.

With $s=2^{2^{h}}$ , the cover in (19.5) has radius $2\delta\lesssim 2^{-h/2}\mathbb{E}\left\lVert g\right\rVert$ . Hence

e_{h}(B_{2}^{n},\left\lVert\cdot\right\rVert)\lesssim 2^{-h/2}\mathbb{E}\left% \lVert g\right\rVert,

which proves 19.4.