18.1 Max Eigenvalue of Random Matrices

CSE 525: Randomized Algorithms Spring 2026 Lecture 18: Hypergraph Sparsification Lecturer: Shayan Oveis Gharan 06/02/26

Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

18.1 Max Eigenvalue of Random Matrices

We will now use generic chaining to show that the largest eigenvalue of a random symmetric $n\times n$ matrix with Rademacher entries is $O(\sqrt{n})$ . This is certainly not the simplest way of proving such a result, but it will give a sense of how these techniques can be applied.

Our “sub-Gaussian” random process is to pick Rademacher random variables $r_{i,j}$ , for each $1\leq i<j\leq n$ , define the matrix $M_{i,j}=M_{j,i}=r_{i,j}$ and let

F(x)=x^{T}Mx

for every $x\in T$ . (we assume the diagonal is 0 for simplicity. It thus follows that

x^{T}Mx=2\sum_{1\leq i<j\leq n}x_{i}x_{j}r_{i,j}=\langle r,z\rangle

So, we let $T$ be the set of vectors

T=\{z\in\mathbb{R}^{{n\choose 2}-n}:\forall i,j:z_{\{i,j\}}=2x_{i},x_{j},\|x\|% =1\}

Let $g$ be a Radamacher random variable By Hoeffding’s inequality for any vector $v$ we can write

\mathbb{P}\left[\langle g,v\rangle\geq\ell\right]\leq\exp\left(-\frac{\ell^{2}% }{\sum_{i}4v_{i}^{2}}\right)=\exp(-\frac{\ell^{2}}{\|v\|^{2}}),

so, this distribution is $1/4$ -subgaussian.

It follows that for $x,y\in\mathbb{R}^{n}$ with $\|x\|=\|y\|=1$

	$\displaystyle\frac{1}{4}\\|z^{x}-z^{y}\\|^{2}\leq\\|xx^{T}-yy^{T}\\|_{F}^{2}$	$\displaystyle=\\|(x-y)x^{T}-y(x-y)^{T}\\|_{F}^{2}$
		$\displaystyle\leq\\|(x-y)x^{T}\\|_{F}^{2}+\\|y(x-y)^{T}\\|_{F}^{2}$
		$\displaystyle=\\|x-y\\|^{2}\\|x\\|^{2}+\\|x-y\\|^{2}\\|y\\|^{2}$
		$\displaystyle\leq 2\\|x-y\\|^{2}.$

Having this we can write

d(z^{x},z^{y})^{2}=\mathbb{E}\left[\langle g,z^{x}-z^{y}\rangle^{2}\right]=\|z% ^{x}-z^{y}\|^{2}\leq 8\|x-y\|^{2}.

Now we need to apply generic chaining to $F$ . We can conclude that an $\epsilon$ -net over the unit Euclidean sphere is also a $\sqrt{8}\cdot\epsilon$ -net for the metric space $(T,d)$ . For the unit Euclidean sphere there is an $\epsilon$ -net of size at most $(3/\epsilon)^{n}$ . To apply generic chaining, let $T_{k}$ be an arbitrary subset of $T$ of cardinality $2^{2^{k}}$ if $2^{k}<n$ , and an $\epsilon$ -net with $\epsilon=3\cdot 2^{-2^{k}/n}$ otherwise. Applying the generic chaining inequality,

\mathbb{E}\left[\sup_{z\in T}F(z)\right]\leq O(1)\cdot\sum_{k}2^{k/2}\cdot% \sqrt{8}\cdot\min\left\{2,3\cdot 2^{-2^{k}/n}\right\}=O(\sqrt{n})

Consider a weighted hypergraph $H=(V,E,w)$ where $w_{e}:e\in E$ are nonnegative edge weights. We associated to $H$ the quadratic expression

Q_{H}(x)=\sum_{e\in E}w_{e}\max_{u,v\in e}(x_{u}-x_{v})^{2}.

The main observation is that If $H$ were a graph i.e., for every edge $e$ we had $|e|=2$ for, this would correspond to the quadratic form of the graph Laplacian.

As our main application of chaining we will explain algorithms to sparsify hypergraphs: That is we want to construct another hypergraph $\tilde{H}=(V,\tilde{E},\tilde{w})$ such that $\tilde{E}\subseteq E$ and such that

|Q_{H}(x)-Q_{\tilde{H}}(x)|\leq\epsilon\cdot QH(x),\quad\quad\forall x\in% \mathbb{R}^{n}

(18.1)

where as usual $\epsilon>0$ is the accuracy parameter of our sparsifier. Furthermore, similar to the graph case we would like to make $|\tilde{E}|$ as small as possible, ideally near-linear in $|V|$ (while —E— could be as large as $2^{|V|}$ in this case).

The following theorem is proved in a paper by James Lee

Theorem 18.1.

For any $n$ -vertex weighted hypergraph $H=(V,E,w)$ and $\epsilon>0$ , there is a spectral $\epsilon$ -sparsifier $\tilde{H}=(V,\tilde{E},\tilde{w})$ for $H$ such that

|\tilde{E}|\leq O(\frac{\log D}{\epsilon^{2}}n\log n),

where $D:=\max_{e\in D}|e|$ .

18.2 Independent random sampling

For any edge $e$ let $Q_{e}$ defined as

Q_{e}(x)=w_{e}\max_{u,v\in E}(x_{u}-x_{v})^{2}.

Therefore, $Q_{H}(x)=\sum_{e}Q_{e}(x)$ .

Suppose we have a probability distribution $p_{e}:e\in E$ , i.e., $p_{e}\geq 0$ and $\sum_{e}p_{e}=1$ . Similar to the graphic case, we let $X$ be an unbiased estimator: Namely we let $X=\frac{w_{e}}{p_{e}}Q_{e}$ with probability $p_{e}$ . Then, it follows that

\mathbb{E}\left[X\right]=\sum_{e}p_{e}\cdot\frac{w_{e}}{p_{e}}Q_{e}=Q_{H}

As usual the difficulty would be in choosing the probabilities $p_{e}$ .

Then, we form $\tilde{H}$ by sampling $X$ , $m$ many times and taking the empricial mean of the samples. In particular, we can write,

Q_{\tilde{H}}(x)=\frac{1}{m}\sum_{k=1}^{m}\frac{w_{e_{k}}}{p_{e_{k}}}Q_{e_{k}}% (x),

Observe that

\mathbb{E}\left[Q_{\tilde{H}}(x)\right]=Q_{H}(x)

for all $x\in\mathbb{R}^{n}$ . So, as before the main question is how to choose the probabilities $p_{e}$ ?

18.3 Auxilury Graph

Define the edge set

F:=\bigcup_{e\in E}{e\choose 2},

and let $G=(V,F,c)$ be a weighted graph, where we will choose the edge conductances $c\in\mathbb{R}^{+}_{F}$ later. Let

L_{G}:=\sum_{\{u,v\}\in F}c_{uv}(b_{u,v})(b_{u,v})^{T},

Let $R(u,v)$ denote the effective resistance between $u, v$ in $G$ . For a hyperedge $e\in H$ , we let

R_{\max}(e):=\max\left\{R_{u,v}:\{u,v\}\in e\right\},

Having this we define

p_{e}:=\frac{w_{e}R_{\max}(e)}{Z},\quad\text{for }e\in E.

where $Z:=\sum_{e\in E}w_{e}R_{\max}(e)$ is the normalizing constant. Note that in the special case that $H$ is a graph $R_{\max}(e)=R(e)$ . Now, the question is how to choose the conductances of the edges of $G$ ?

The following is the main lemma:

Lemma 18.2.

Suppose it holds that

\|x\|^{2}\leq Q_{H}(L_{G}^{-1/2}x),\forall x\in\mathbb{R}^{n},

(18.2)

then for any $1>\epsilon>0$ and $m\leq O(\frac{\log D}{\epsilon^{2}}Z\log n)$ , with a constant probability $\tilde{H}$ is a sparsifier of $H$ .

The proof of this lemma uses the chaining machinery. But, let us first discuss how to satisfy assumptions of this lemma?

Roughly speaking this auxiluary graph $G$ , puts $Q_{H}$ into isotropic position. Of course, this step is very straightforward for matrices but as you will see this is fairly more complicated for these non-linear operators.

18.4 Choosing Conductances

We are therefore left to find edge conductances in the graph $G=(V,F,c)$ so that (18.2) holds and $Z$ is small. To this end, let us choose nonnegative numbers

\{c^{e}_{uv}\geq 0:\{u,v\}\in\tbinom{e}{2},e\in E\}

such that

\sum_{\{i,j\}\in\tbinom{e}{2}}c^{e}_{uv}=w_{e},\quad\forall e\in E.

(3.11)

For $\{u,v\}\in F$ , we then define our edge conductance

c_{uv}:=\sum_{e\in E:\{u,v\}\in\tbinom{e}{2}}c^{e}_{uv}.

(18.3)

In this case,

	$\displaystyle\\|L_{G}^{1/2}y\\|^{2}=\langle y,L_{G}y\rangle$	$\displaystyle=\sum_{\{i,j\}\in F}c_{uv}(y_{u}-y_{v})^{2}=\sum_{e\in E}\sum_{\{% u,v\}\in\tbinom{e}{2}}c^{e}_{uv}(y_{u}-y_{v})^{2}$
		$\displaystyle\leq\sum_{e\in E}\sum_{\{u,v\}\in\tbinom{e}{2}}c^{e}_{uv}\max_{\{% u,v\}\in\tbinom{e}{2}}(y_{u}-y_{v})^{2}$
		$\displaystyle=\sum_{e\in E}w_{e}\max_{\{i,j\}\in\tbinom{e}{2}}(y_{u}-y_{v})^{2% }=Q_{H}(y).$

Taking $y=L_{G}^{+1/2}x$ gives

\|x\|^{2}\leq Q_{H}(L_{G}^{+1/2}x),

verifying (18.2).

Lemma 18.3 (Foster’s Network Theorem).

It holds that

\sum_{\{u,v\}\in F}c_{uv}R(u,v)=n-1.

Proof.

The observation is that

\sum_{u,v}c_{uv}R(u,v)=\sum_{u,v}c_{uv}\operatorname{Tr}(b_{u,v}^{T}L_{G}^{-1}% b_{u,v})=\sum_{u,v}c_{uv}\operatorname{Tr}(b_{u,v}b_{u,v}^{T}L_{G}^{-1})=% \operatorname{Tr}(L_{G}^{-1}L_{G})=n-1.

∎

Now, define

K:=\max_{e\in E}\max_{u,v\in{e\choose 2}}\frac{R_{\max}(e)}{R(u,v)}\mathbb{I}% \left[c^{e}_{u,v}>0\right].

Then, $Z=\sum_{e\in E}w_{e}R_{\max}(e)\leq K(n-1)$ .

Lemma 18.4.

We can choose conductances $c^{e}_{u,v}$ such that (18.3) is satisfied and $K=1$ .

Proof.

The conductances can be computed by solving the following convex program:

$\displaystyle\max$	$\displaystyle\log\det(L_{G}(c_{u,v})+J)$	(18.4)
s.t.,	$\displaystyle\sum_{u,v\in{e\choose 2}}c_{u,v}^{e}=w_{e}$
	$\displaystyle c_{u,v}^{e}\geq 0.$

Note that this program is convex as log of the determinant is a concave function. Equivalently, the objective can be written as log of the generating polynomial of all spanning trees of $G$ : $\log\sum_{T}\prod_{\{u,v\}\in T}c_{u,v}$ . The convexity follows by the fact that any real stable polynomials is log-concave.

We don’t go into the details here. The proof uses writes the ”KKT” condition for the optimality of the convex program and deduces the bound on $K$ from that. We remark that ∎

18.5 Notes on Proof of 18.2

The proof of 18.2 is technical but at high-level it uses the generic chaining machinery. The set $T=\{x:Q_{H}(L_{G}^{-1/2}x)\leq 1\}$ and the proof uses (18.2) which says that $T$ is a subset of the unit ball.

We just explain the first few steps: We let $\hat{H}$ be an independent copy of $\tilde{H}$ ,

	$\displaystyle\mathbb{E}_{\tilde{H}}\max_{Q_{H}(y)\leq 1}\|Q_{H}(y)-Q_{\tilde{H}% }(y)\|$	$\displaystyle=\mathbb{E}_{\tilde{H}}\max_{Q_{H}(y)\leq 1}\|\mathbb{E}_{\hat{H}}% [Q_{\hat{H}}(y)]-Q_{\tilde{H}}(y)\|$
		$\displaystyle=\mathbb{E}_{\tilde{H}}\max_{Q_{H}(y)\leq 1}\|\mathbb{E}_{\hat{H}}% [Q_{\hat{H}}(y)-Q_{\tilde{H}}(y)]\|$
		$\displaystyle\leq\mathbb{E}_{\hat{H},\tilde{H}}\max_{Q_{H}(y)\leq 1}\|Q_{\hat{H% }}(y)-Q_{\tilde{H}}(y)\|,$

where we have used that $|\mathbb{E}X|\leq\mathbb{E}|X|$ and $\max(\mathbb{E}X_{1},\dots,\mathbb{E}X_{k})\leq\mathbb{E}\max(X_{1},\dots,X_{k})$ .

The point of this first few steps is to make the process ”centered”. The second step is to avoid binary 0/1 random variables. Such random variables are very annoying to run a chaining argument for.

The idea is that the distribution of $Q_{\hat{e}_{k}}(y)-Q_{\tilde{e}_{k}}(y)$ is symmetric around the origin, i.e., centered. So, $Q_{\hat{e}_{k}}(y)-Q_{\tilde{e}_{k}}(y)$ and $-(Q_{\hat{e}_{k}}(y)-Q_{\tilde{e}_{k}}(y))$ have the same distribution.

	$\displaystyle\mathbb{E}_{\hat{H},\tilde{H}}\max_{Q_{H}(y)\leq 1}\|Q_{\hat{H}}(y% )-Q_{\tilde{H}}(y)\|$	$\displaystyle\leq\mathbb{E}_{s\in\{-1,+1\}^{m}}\mathbb{E}_{\hat{H},\tilde{H}}% \max_{Q_{H}(y)\leq 1}\left\|\frac{1}{m}\sum_{k=1}^{m}s_{i}\left(\frac{w_{\hat{e% }_{i}}}{p_{e_{i}}}Q_{\hat{e}_{i}}(y)-\frac{w_{\tilde{e}_{i}}}{p_{\tilde{e}_{i}% }}Q_{\tilde{e}_{i}}(y)\right)\right\|$
		$\displaystyle\leq 2\mathbb{E}_{s\in\{-1,+1\}^{m}}\mathbb{E}_{\hat{H}}\max_{Q_{% H}(y)\leq 1}\left\|\frac{1}{m}\sum_{k=1}^{m}s_{i}\frac{w_{\hat{e}_{i}}}{p_{e_{i% }}}Q_{\hat{e}_{i}}(y)\right\|$

First, notice

\frac{w_{e}}{p_{e}}Q_{e}(L_{G}^{-1/2}x)=\frac{w_{e}}{p_{e}}\max_{{u,v}\in{e% \choose 2}}\langle L_{G}^{-1/2}x,b_{u,v}\rangle^{2}=\max_{{u,v}\in{e\choose 2}% }\langle x,z^{e}_{u,v}\rangle^{2}

where $z_{u,v}=L_{G}^{-1/2}b_{u,v}$ and $z^{e}_{u,v}=\sqrt{w_{e}/p_{e}}z_{u,v}$ . If we let

N_{k}(x)=\max_{u,v\in e_{k}}|\langle x,z_{u,v}^{e_{k}}\rangle|

Then,

Q_{\tilde{H}}(L_{G}^{-1/2}x)=\frac{1}{m}\sum_{i=1}^{k}N_{i}(x)^{2}.

Similarly, the object we need to study is

\frac{1}{m}\sum_{i=1}^{k}s_{i}N_{i}(x)^{2}

for Radamacher random variables $\{s_{i}\}_{1\leq i\leq m}$ . So, we just need to the expected value of this quantity over the space $T=\{x:Q_{H}(L_{G}^{-1/2}x)\leq 1\}$ . This is where the chaining is used. But the details is beyond the scope of this course.

	$\displaystyle\frac{1}{4}\\|z^{x}-z^{y}\\|^{2}\leq\\|xx^{T}-yy^{T}\\|_{F}^{2}$	$\displaystyle=\\|(x-y)x^{T}-y(x-y)^{T}\\|_{F}^{2}$
		$\displaystyle\leq\\|(x-y)x^{T}\\|_{F}^{2}+\\|y(x-y)^{T}\\|_{F}^{2}$
		$\displaystyle=\\|x-y\\|^{2}\\|x\\|^{2}+\\|x-y\\|^{2}\\|y\\|^{2}$
		$\displaystyle\leq 2\\|x-y\\|^{2}.$

	$\displaystyle\mathbb{E}_{\tilde{H}}\max_{Q_{H}(y)\leq 1}\|Q_{H}(y)-Q_{\tilde{H}% }(y)\|$	$\displaystyle=\mathbb{E}_{\tilde{H}}\max_{Q_{H}(y)\leq 1}\|\mathbb{E}_{\hat{H}}% [Q_{\hat{H}}(y)]-Q_{\tilde{H}}(y)\|$
		$\displaystyle=\mathbb{E}_{\tilde{H}}\max_{Q_{H}(y)\leq 1}\|\mathbb{E}_{\hat{H}}% [Q_{\hat{H}}(y)-Q_{\tilde{H}}(y)]\|$
		$\displaystyle\leq\mathbb{E}_{\hat{H},\tilde{H}}\max_{Q_{H}(y)\leq 1}\|Q_{\hat{H% }}(y)-Q_{\tilde{H}}(y)\|,$