CSE 525: Randomized Algorithms Spring 2026 Lecture 18: Hypergraph Sparsification Lecturer: Shayan Oveis Gharan 06/02/26

Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

18.1 Max Eigenvalue of Random Matrices

We will now use generic chaining to show that the largest eigenvalue of a random symmetric n×n matrix with Rademacher entries is O(n). This is certainly not the simplest way of proving such a result, but it will give a sense of how these techniques can be applied.

Our “sub-Gaussian” random process is to pick Rademacher random variables ri,j, for each 1i<jn, define the matrix Mi,j=Mj,i=ri,j and let

F(x)=xTMx

for every xT. (we assume the diagonal is 0 for simplicity. It thus follows that

xTMx=21i<jnxixjri,j=r,z

So, we let T be the set of vectors

T={z(n2)n:i,j:z{i,j}=2xi,xj,x=1}

Let g be a Radamacher random variable By Hoeffding’s inequality for any vector v we can write

[g,v]exp(2i4vi2)=exp(2v2),

so, this distribution is 1/4-subgaussian.

It follows that for x,yn with x=y=1

14zxzy2xxTyyTF2 =(xy)xTy(xy)TF2
(xy)xTF2+y(xy)TF2
=xy2x2+xy2y2
2xy2.

Having this we can write

d(zx,zy)2=𝔼[g,zxzy2]=zxzy28xy2.

Now we need to apply generic chaining to F. We can conclude that an ϵ-net over the unit Euclidean sphere is also a 8ϵ-net for the metric space (T,d). For the unit Euclidean sphere there is an ϵ-net of size at most (3/ϵ)n. To apply generic chaining, let Tk be an arbitrary subset of T of cardinality 22k if 2k<n, and an ϵ-net with ϵ=322k/n otherwise. Applying the generic chaining inequality,

𝔼[supzTF(z)]O(1)k2k/28min{2,322k/n}=O(n)

Consider a weighted hypergraph H=(V,E,w) where we:eE are nonnegative edge weights. We associated to H the quadratic expression

QH(x)=eEwemaxu,ve(xuxv)2.

The main observation is that If H were a graph i.e., for every edge e we had |e|=2 for, this would correspond to the quadratic form of the graph Laplacian.

As our main application of chaining we will explain algorithms to sparsify hypergraphs: That is we want to construct another hypergraph H~=(V,E~,w~) such that E~E and such that

|QH(x)QH~(x)|ϵQH(x),xn (18.1)

where as usual ϵ>0 is the accuracy parameter of our sparsifier. Furthermore, similar to the graph case we would like to make |E~| as small as possible, ideally near-linear in |V| (while —E— could be as large as 2|V| in this case).

The following theorem is proved in a paper by James Lee

Theorem 18.1.

For any n-vertex weighted hypergraph H=(V,E,w) and ϵ>0, there is a spectral ϵ-sparsifier H~=(V,E~,w~) for H such that

|E~|O(logDϵ2nlogn),

where D:=maxeD|e|.

18.2 Independent random sampling

For any edge e let Qe defined as

Qe(x)=wemaxu,vE(xuxv)2.

Therefore, QH(x)=eQe(x).

Suppose we have a probability distribution pe:eE, i.e., pe0 and epe=1. Similar to the graphic case, we let X be an unbiased estimator: Namely we let X=wepeQe with probability pe. Then, it follows that

𝔼[X]=epewepeQe=QH

As usual the difficulty would be in choosing the probabilities pe.

Then, we form H~ by sampling X, m many times and taking the empricial mean of the samples. In particular, we can write,

QH~(x)=1mk=1mwekpekQek(x),

Observe that

𝔼[QH~(x)]=QH(x)

for all xn. So, as before the main question is how to choose the probabilities pe?

18.3 Auxilury Graph

Define the edge set

F:=eE(e2),

and let G=(V,F,c) be a weighted graph, where we will choose the edge conductances cF+ later. Let

LG:={u,v}Fcuv(bu,v)(bu,v)T,

Let R(u,v) denote the effective resistance between u,v in G. For a hyperedge eH, we let

Rmax(e):=max{Ru,v:{u,v}e},

Having this we define

pe:=weRmax(e)Z,for eE.

where Z:=eEweRmax(e) is the normalizing constant. Note that in the special case that H is a graph Rmax(e)=R(e). Now, the question is how to choose the conductances of the edges of G?

The following is the main lemma:

Lemma 18.2.

Suppose it holds that

x2QH(LG1/2x),xn, (18.2)

then for any 1>ϵ>0 and mO(logDϵ2Zlogn), with a constant probability H~ is a sparsifier of H.

The proof of this lemma uses the chaining machinery. But, let us first discuss how to satisfy assumptions of this lemma?

Roughly speaking this auxiluary graph G, puts QH into isotropic position. Of course, this step is very straightforward for matrices but as you will see this is fairly more complicated for these non-linear operators.

18.4 Choosing Conductances

We are therefore left to find edge conductances in the graph G=(V,F,c) so that (18.2) holds and Z is small. To this end, let us choose nonnegative numbers

{cuve0:{u,v}(e2),eE}

such that

{i,j}(e2)cuve=we,eE. (3.11)

For {u,v}F, we then define our edge conductance

cuv:=eE:{u,v}(e2)cuve. (18.3)

In this case,

LG1/2y2=y,LGy ={i,j}Fcuv(yuyv)2=eE{u,v}(e2)cuve(yuyv)2
eE{u,v}(e2)cuvemax{u,v}(e2)(yuyv)2
=eEwemax{i,j}(e2)(yuyv)2=QH(y).

Taking y=LG+1/2x gives

x2QH(LG+1/2x),

verifying (18.2).

Lemma 18.3 (Foster’s Network Theorem).

It holds that

{u,v}FcuvR(u,v)=n1.
Proof.

The observation is that

u,vcuvR(u,v)=u,vcuvTr(bu,vTLG1bu,v)=u,vcuvTr(bu,vbu,vTLG1)=Tr(LG1LG)=n1.

Now, define

K:=maxeEmaxu,v(e2)Rmax(e)R(u,v)𝕀[cu,ve>0].

Then, Z=eEweRmax(e)K(n1).

Lemma 18.4.

We can choose conductances cu,ve such that (18.3) is satisfied and K=1.

Proof.

The conductances can be computed by solving the following convex program:

max logdet(LG(cu,v)+J) (18.4)
s.t., u,v(e2)cu,ve=we
cu,ve0.

Note that this program is convex as log of the determinant is a concave function. Equivalently, the objective can be written as log of the generating polynomial of all spanning trees of G: logT{u,v}Tcu,v. The convexity follows by the fact that any real stable polynomials is log-concave.

We don’t go into the details here. The proof uses writes the ”KKT” condition for the optimality of the convex program and deduces the bound on K from that. We remark that ∎

18.5 Notes on Proof of 18.2

The proof of 18.2 is technical but at high-level it uses the generic chaining machinery. The set T={x:QH(LG1/2x)1} and the proof uses (18.2) which says that T is a subset of the unit ball.

We just explain the first few steps: We let H^ be an independent copy of H~,

𝔼H~maxQH(y)1|QH(y)QH~(y)| =𝔼H~maxQH(y)1|𝔼H^[QH^(y)]QH~(y)|
=𝔼H~maxQH(y)1|𝔼H^[QH^(y)QH~(y)]|
𝔼H^,H~maxQH(y)1|QH^(y)QH~(y)|,

where we have used that |𝔼X|𝔼|X| and max(𝔼X1,,𝔼Xk)𝔼max(X1,,Xk).

The point of this first few steps is to make the process ”centered”. The second step is to avoid binary 0/1 random variables. Such random variables are very annoying to run a chaining argument for.

The idea is that the distribution of Qe^k(y)Qe~k(y) is symmetric around the origin, i.e., centered. So, Qe^k(y)Qe~k(y) and (Qe^k(y)Qe~k(y)) have the same distribution.

𝔼H^,H~maxQH(y)1|QH^(y)QH~(y)| 𝔼s{1,+1}m𝔼H^,H~maxQH(y)1|1mk=1msi(we^ipeiQe^i(y)we~ipe~iQe~i(y))|
2𝔼s{1,+1}m𝔼H^maxQH(y)1|1mk=1msiwe^ipeiQe^i(y)|

First, notice

wepeQe(LG1/2x)=wepemaxu,v(e2)LG1/2x,bu,v2=maxu,v(e2)x,zu,ve2

where zu,v=LG1/2bu,v and zu,ve=we/pezu,v. If we let

Nk(x)=maxu,vek|x,zu,vek|

Then,

QH~(LG1/2x)=1mi=1kNi(x)2.

Similarly, the object we need to study is

1mi=1ksiNi(x)2

for Radamacher random variables {si}1im. So, we just need to the expected value of this quantity over the space T={x:QH(LG1/2x)1}. This is where the chaining is used. But the details is beyond the scope of this course.