CSE 525: Randomized Algorithms Spring 2025 Lecture 3: Strong Concentration Bounds Lecturer: Shayan Oveis Gharan 04/02/2026
Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
We have seen how knowledge of the variance of a random variable can be used to control deviation of from its mean. This is the heart of the second moment method. But often we can control even higher moments, and this allows us to obtain much stronger concentration properties. A prototypical example is when is a family of independent (but not necessarily identically distributed) random variables and . Let and define . In that case, we have the following multiplicative form of the ”Chernoff bound”.
Theorem 3.1 (Multiplicative Chernoff bound).
. For every , it holds that
and
Consequently,
Proof.
Let be a parameter that we choose later.
| (3.1) |
The first inequality uses that the exponential function is a monotone function.
Now, we can write
Now, observe that
Plugging this back we obtain
3.1 Giant Connected Components in Erdös-Réyni Graphs
In this section we prove the following theorem.
Theorem 3.2.
Theorem 1 Let be a small enough constant. Let be an Erdös-Réyni random graph with parameter .
-
1.
Let . Then whp all connected components of are of size at most .
-
2.
Let . Then whp contains a path of length at least .
We run the DFS algorithm to prove the theorem. First, let us recall this algorithm: Fix a natural order on the vertices of we assume that algorithm prioritizes vertices according to this natural order. DFS maintains three sets of vertices, letting be the set of vertices whose exploration is complete, i.e., explored, be the set of unvisited vertices, and be the set of active vertices in the stack.
The algorithm starts with and , and runs till . At each round of the algorithm, if the set is non-empty, the algorithm queries for neighbors of the last vertex that has been added to , scanning according to the natural order. If has a neighbor , the algorithm deletes from and inserts it into . If does not have a neighbor in , then is popped out of and is moved to . If is empty, the algorithm chooses the first vertex of according to the natural order, deletes it from and pushes it into . In order to complete the exploration of the graph, whenever the sets and have both become empty (at this stage all connected components of have been revealed), we make the algorithm query all remaining pairs of vertices in , not queried before.
The following properties of DFS are immediate:
-
•
At each round of the algorithm one vertex moves, either from to , or from to ;
-
•
At any time during the algorithm, it has been revealed already that the graph has no edges between the current set and the current set of unvisited vertices ;
-
•
The set always spans a path (indeed, when a vertex is added to , it happens because is a neighbor of the last vertex in ; thus, augments the path spanned by , of which is the last vertex).
Let To prove the theorem we run DFS on a random input . Thus we feed DFS algorithm with a sequence of i.i.d. Bernoulli(p) random variables so that is gets its i-th query answered positively if and answered negatively otherwise, the so obtained graph is clearly distributed according to G(n, p). Thus, studying the component structure of can be reduced to studying the properties of the random sequence . In particular, observe crucially that as long as , every positive answer to a query results in a vertex being moved from to , and thus after queries and assuming still, we have . (The last inequality is strict in fact as the first vertex of each connected component is moved from to ”for free”, i.e., without need to get a positive answer to a query.) On the other hand, since the addition of every vertex, but the first one in a connected component, to U is caused by a positive answer to a query, we have at time t: .
The following lemma gives us the tool that we need to prove the theorem.
Lemma 3.3.
Let be a small enough constant. Consider the sequence of iid Bernoulli random variables with parameter . .
-
1.
Let and . Then, with probability , there is no interval of length where at least of the Bernoullis are 1.
-
2.
Let and . Then,
Proof.
Consider an interval of length in . Let . Notice . By the multiplicative Chernoff bound,
where the last inequality follows by . By a union bound the probability, since there are only many such intervals the claim follows.
To prove 2, let . Then,
Now, again by multiplicative Chernoff bound, for
∎
We are now ready to prove the theorem.
Part 1.
Assume to the contrary that contains a connected component with more than vertices. Let us look at the epoch of the DFS when was created (an epoch is a period during which the stack gets empty again). Consider the moment inside this epoch when the algorithm has found the -st vertex of and is about to move it to . Denote at that moment. Then , and thus the algorithm got exactly positive answers to its queries to random variables during the epoch, with each positive answer being responsible for revealing a new vertex of , after the first vertex of was put into in the beginning of the epoch. During the epoch only pairs of edges touching have been queried, and the number of such pairs is therefore at most . It thus follows that the sequence contains an interval of length at most with at least 1’s which is a contradiction.
Part 2.
Now, assume that the sequence satisfies Property 2 of 3.3. We claim that after the first queries of the DFS algorithm, the set contains at least vertices (with the contents of forming a path of desired length at that moment).
First observe that at time . Indeed, if , then let us look at a moment where . At that moment by Property 2 of the Lemma. Then , and the algorithm has examined all pairs between and (and found them to be non-edges) – a contradiction.
Getting back to time ; now assume and then, we have . This means in particular that the algorithm is still revealing connected components of , and each positive answer it got resulted in moving a vertex from to (some of these vertices may have already moved further from to ). By Property 2 of 3.3 the number of positive answers at that point is at least . Hence, we have . If , then . Therefore, all pairs of vertices between are queried already (and received a negative answer), i.e., many pairs. It follows that
as desired.