CSE 525: Randomized Algorithms Spring 2025 Lecture 3: Strong Concentration Bounds Lecturer: Shayan Oveis Gharan 04/02/2026

Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

We have seen how knowledge of the variance of a random variable X can be used to control deviation of X from its mean. This is the heart of the second moment method. But often we can control even higher moments, and this allows us to obtain much stronger concentration properties. A prototypical example is when X1,X2,,Xn is a family of independent (but not necessarily identically distributed) {0,1} random variables and X=X1+X2++Xn. Let pi=𝔼[Xi] and define μ=𝔼[X]=i=1np1+p2++pn. In that case, we have the following multiplicative form of the ”Chernoff bound”.

Theorem 3.1 (Multiplicative Chernoff bound).

. For every δ0, it holds that

[X(1+δ)μ](eδ(1+δ)1+δ)μ.

and

[X<(1δμ](eδ(1δ)1δ)μ

Consequently,

[X(1+δ)μ]eδ2μ/(2+δ,[X(1δ)μ]eδ2μ/2
Proof.

Let t be a parameter that we choose later.

[X(1+δ)μ]=[etXet(1+δ)μ]Markov’s Inequality𝔼[etX]et(1+δ)μ. (3.1)

The first inequality uses that the exponential function is a monotone function.

Now, we can write

𝔼[etX]=𝔼[etiXi]=𝔼[i=1netXi]=independencei=1n𝔼[etXi].

Now, observe that

𝔼[etX]=piet+(1pi)=1+pi(et1)1+xexepi(et1)

Plugging this back we obtain

𝔼[etX]i=1nepi(et1)=eμ(et1)

Putting back in (3.1), we obtain

[X(1+δ)μ]eμ(et1)et(1+δ)μ=eμ(et1(1+δ)t)=set t=ln(1+δ)(eδ(1+δ)1+δ)μ

The other case can be proven similarly. ∎

3.1 Giant Connected Components in Erdös-Réyni Graphs

In this section we prove the following theorem.

Theorem 3.2.

Theorem 1 Let ϵ>0 be a small enough constant. Let G be an Erdös-Réyni random graph with parameter p.

  1. 1.

    Let p=1ϵn. Then whp all connected components of G are of size at most 7ϵ2lnn.

  2. 2.

    Let p=1+ϵn. Then whp G contains a path of length at least ϵ2n5.

We run the DFS algorithm to prove the theorem. First, let us recall this algorithm: Fix a natural order 1<2<<n on the vertices of G we assume that algorithm prioritizes vertices according to this natural order. DFS maintains three sets of vertices, letting X be the set of vertices whose exploration is complete, i.e., explored, U be the set of unvisited vertices, and T=[n]XU be the set of active vertices in the stack.

The algorithm starts with X=T= and U=V, and runs till TU=. At each round of the algorithm, if the set T is non-empty, the algorithm queries U for neighbors of the last vertex v that has been added to T, scanning U according to the natural order. If v has a neighbor uU, the algorithm deletes u from U and inserts it into T . If v does not have a neighbor in U, then v is popped out of T and is moved to X. If T is empty, the algorithm chooses the first vertex of U according to the natural order, deletes it from U and pushes it into T. In order to complete the exploration of the graph, whenever the sets T and U have both become empty (at this stage all connected components of G have been revealed), we make the algorithm query all remaining pairs of vertices in S, not queried before.

The following properties of DFS are immediate:

  • At each round of the algorithm one vertex moves, either from U to T , or from T to X;

  • At any time during the algorithm, it has been revealed already that the graph G has no edges between the current set X and the current set of unvisited vertices U;

  • The set T always spans a path (indeed, when a vertex u is added to T , it happens because u is a neighbor of the last vertex v in T ; thus, u augments the path spanned by T, of which v is the last vertex).

Let N=(n2) To prove the theorem we run DFS on a random input G(n,p). Thus we feed DFS algorithm with a sequence of i.i.d. Bernoulli(p) random variables Y1,,YN so that is gets its i-th query answered positively if Yi=1 and answered negatively otherwise, the so obtained graph is clearly distributed according to G(n, p). Thus, studying the component structure of G can be reduced to studying the properties of the random sequence X. In particular, observe crucially that as long as U, every positive answer to a query results in a vertex being moved from U to T , and thus after t queries and assuming T still, we have |XT|i=1tYi. (The last inequality is strict in fact as the first vertex of each connected component is moved from T to U ”for free”, i.e., without need to get a positive answer to a query.) On the other hand, since the addition of every vertex, but the first one in a connected component, to U is caused by a positive answer to a query, we have at time t: |T|1+i=1tYi.

The following lemma gives us the tool that we need to prove the theorem.

Lemma 3.3.

Let ϵ>0 be a small enough constant. Consider the sequence of iid Bernoulli random variables with parameter p. Y1,,YN.

  1. 1.

    Let p=1ϵn and k=7ϵ2lnn. Then, with probability 11/n, there is no interval of length kn where at least k of the Bernoullis are 1.

  2. 2.

    Let p=1+ϵn and N0=ϵn22. Then,

    [|i=1N0Yiϵ(1+ϵ)n2|<n2/3]1o(1).
Proof.

Consider an interval I of length kn in [N]. Let Y=iIYi. Notice 𝔼[Y]=knp. By the multiplicative Chernoff bound,

[Yk]=[Y𝔼[Y]np]=[Y𝔼[Y]1ϵ]exp(ϵ2𝔼[Y]2+ϵ)n72+ϵ

where the last inequality follows by k=7ϵ2lnn. By a union bound the probability, since there are only O(n2) many such intervals the claim follows.

To prove 2, let Y=i=1N0Yi. Then,

𝔼[Y]=N0p=(1+ϵ)ϵn2

Now, again by multiplicative Chernoff bound, for δ=2n1/3ϵ(1+ϵ)

[|i=1N0Yiϵ(1+ϵ)n2|>n2/3]exp(δ2μ/3)exp(n1/3)

We are now ready to prove the theorem.

Part 1.

Assume to the contrary that G contains a connected component C with more than k=7ϵ2lnn vertices. Let us look at the epoch of the DFS when C was created (an epoch is a period during which the stack gets empty again). Consider the moment inside this epoch when the algorithm has found the (k+1)-st vertex of C and is about to move it to T. Denote XC=XC at that moment. Then |XCT|=k, and thus the algorithm got exactly k positive answers to its queries to random variables Yi during the epoch, with each positive answer being responsible for revealing a new vertex of C, after the first vertex of C was put into T in the beginning of the epoch. During the epoch only pairs of edges touching XCT have been queried, and the number of such pairs is therefore at most (k2)+k(nk)kn. It thus follows that the sequence Y contains an interval of length at most kn with at least k 1’s which is a contradiction.

Part 2.

Now, assume that the sequence Y satisfies Property 2 of 3.3. We claim that after the first N0=ϵn22 queries of the DFS algorithm, the set T contains at least ϵ2n5 vertices (with the contents of T forming a path of desired length at that moment).

First observe that |X|<n3 at time N0. Indeed, if |X|n3, then let us look at a moment t where |X|=n3. At that moment |T|1+i=1tYi1+ϵ(1+ϵ)n2+n2/3<n3 by Property 2 of the Lemma. Then |U|=n|X||T|n3, and the algorithm has examined all |X||U|n29>N0 pairs between X and U (and found them to be non-edges) – a contradiction.

Getting back to time N0; now assume |X|<n3 and |T|<ϵ2n5 then, we have U. This means in particular that the algorithm is still revealing connected components of G, and each positive answer it got resulted in moving a vertex from U to T (some of these vertices may have already moved further from T to X). By Property 2 of 3.3 the number of positive answers at that point is at least ϵ(1+ϵ)n2n2/3. Hence, we have |XT|ϵ(1+ϵ)n2n2/3. If |T|ϵ2n5, then |X|ϵn2+3ϵ2n10n2/3. Therefore, all pairs of vertices between X,U are queried already (and received a negative answer), i.e., |X||U| many pairs. It follows that

ϵ2n2=N0 |X||U||X|(n|X|ϵ2n5)
(ϵn2+3ϵ2n10n2/3)(nϵn2ϵ2n2+n2/3)
ϵn22+ϵ2n220O(ϵ3)n2>ϵn22

as desired.