1.1 Introduction to the Probabilistic Method

CSE 521: Design and Analysis of Algorithms I Spring 2025 Lecture 1: The Probabilistic Method Lecturer: Shayan Oveis Gharan 04-01-2025 Scribe:

Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

1.1 Introduction to the Probabilistic Method

An old math puzzle goes: Suppose there are six people in a room; some of them shake hands. Prove that there are at least three people who all shook each others’ hands or three people such that no pair of them shook hands. Generalized a bit, this is the classic Ramsey problem. The diagonal Ramsey numbers $R(k)$ are defined as follows. $R(k)$ is the smallest integer $n$ such that in every two-coloring of the edges of the complete graph $K_{n}$ by red and blue, there is a monochromatic copy of $K_{k}$ , i.e. there are k nodes such that all of the ${k\choose 2}$ edges between them are red or all of the edges are blue. A solution to the puzzle above asserts that $R(3)\leq 6$ (and it is easy to check that, in fact, $R(3)=6$ ).

In 1929, Ramsey proved that $R(k)$ is finite for every $k$ . We want to show that $R(k)$ must grow pretty fast; in fact, we’ll prove that for $k\geq 3$ , we have $R(k)>2^{k/2}$ . This requires finding a coloring of $K_{n}$ that doesn’t contain any monochromatic $K_{k}$ . To do this, we’ll use the probabilistic method: We’ll give a random coloring of $K_{n}$ and show that it satisfies our desired property with positive probability. This proof appeared in a paper of Erdös from 1947, and this is the example that starts Alon and Spencer’s famous book devoted to the probabilistic method which will be one of the main resources for this coruse.

Lemma 1.1.

If ${n\choose k}2^{1-{k\choose 2}}<1$ , then $R(k)>n$ . In particular, $R(k)>2^{k/2}$ for $k\geq 3$ .

Proof.

Consider a uniformly random 2-coloring of the edges of $K_{n}$ . Every edge is colored red or blue independently with probability half each. For any fixed set of $k$ vertices $H$ , let $E_{H}$ denote the event that the induced subgraph on $H$ is monochromatic. An easy calculation yields

\mathbb{P}\left[E_{H}\right]=2\cdot 2^{-{k\choose 2}}.

Since there are ${n\choose k}$ possible choices for $H$ , we can use the union bound:

\mathbb{P}\left[\exists H\text{ s.t., }E_{H}\right]\leq 2\cdot 2^{-{k\choose 2% }}\cdot{n\choose k}.

Thus if $2^{1-{k\choose 2}}{n\choose k}<1$ , then with positive probability, no event $E_{H}$ occurs. Thus there must exist at least one coloring with no monochromatic $K_{k}$ . One can check that if $k\geq 3$ and $n={2^{k/2}}$ , then this is satisfied. ∎

In the proof, we employed the following fundamental tool:

Fact 1.2 (Union Bound).

If $A_{1},A_{2},\dots,A_{m}$ are arbitrary events, then $\mathbb{P}\left[A_{1}\cup A_{2}\cup\dots\cup A_{m}\right]\leq\mathbb{P}\left[A% _{1}\right]+\mathbb{P}\left[A_{2}\right]+\dots+\mathbb{P}\left[A_{m}\right]$ .

1.2 Linearity of Expectations

Fact 1.3 (Linearity of Expectation).

If $X_{1},X_{2},\dots,X_{n}$ are real-valued random variables, then

\mathbb{E}\left[X_{1}+X_{2}+\dots+X_{n}\right]=\mathbb{E}\left[X_{1}\right]+% \mathbb{E}\left[X_{2}\right]+\dots+\mathbb{E}\left[X_{n}\right].

The great fact about this inequality is that we don’t need to know anything about the relationships between the random variables; linearity of expectation holds no matter what the dependence structure is.

Let’s consider a 3-CNF formula over the variables $x_{1},x_{2},\dots,x_{n}$ . Such a formula has the form $C_{1}\wedge C_{2}\wedge\dots\wedge C_{m}$ where each clause is an or of three literals involving distinct variables: $C_{i}=z_{i,1}\vee z_{i_{2}}\vee z_{i,3}$ . A literal is a variable or its negation. For instance, $(x_{2}\vee\bar{x}_{3}\vee\bar{x}_{4})\wedge(x_{3}\vee\bar{x}_{5}\vee\bar{x}_{1% })\wedge(x_{1}\vee x_{5}\vee x_{4})$ is a 3-CNF formula.

Lemma 1.4.

If $\phi$ is a 3-CNF formula with $m$ clauses, then there exists an assignment that makes at least $7m/8$ clauses evaluate to true.

Proof.

We will prove this using the probabilistic method. For every variable independently, we choose a uniformly random truth assignment: true or false each with probability 1/2. Let $A_{i}$ equal 1 if clause $C_{i}$ is satisfied by our random assignment, and equal 0 otherwise. Then $\mathbb{P}\left[Ai=1\right]\geq 7/8$ because there are 7 ways to satisfy a clause out of the 8 possible truth values for its literals. Let $A=A_{1}+\dots+A_{m}$ denote the total number of satisfied clauses. By linearity of expectation,

\mathbb{E}\left[A\right]=\sum_{i}\mathbb{E}\left[A_{i}\right]=7m/8.

So, there must be an assignment that satisfies this many clauses. ∎

1.3 Method of Conditional Expectations

The above lemma asserts that there exists an assignment satisfying at least $7m/8$ many clauses, but what if we wish to actually find one? One way is to randomly sample from the underlying distribution and then check the resulting assignment. Analyzing the probability of success will require our tail bounds which we will discuss in future lectures.

In this section, we will discuss a generic method that can turn many of the probabilistic method proofs into even deterministic algorithms. Let $S(x_{1},x_{2},\dots,x_{n})$ denote the expected number of satisfied clauses given a partial truth assignment to the input variables, where we choose the unassigned variables uniformly at random. We will use T to denote true, F to denote false, and * to denote that no assignment has been chosen for that variable. For instance, $S(\star,\star,\dots,\star)$ denotes the expected number of satisfied clauses in a random assignment, and we have already seen that

S(\star,\star,\dots,\star)=7m/8.

Note that a simple linear-time algorithm can estimate $S(x_{1},x_{2},...,x_{n})$ for any partial assignment $x_{1},...,x_{n}\in\{T,F,\star\}$ by simply going through the clauses one by one.

As an example, consider the clause $x_{1}\vee\bar{x}+2\vee\bar{x}_{4}$ . The probability that a random assignment satisfies this is 7/8. If we assign $x_{1}=F$ , then the probability becomes $3/4$ , and if we set $x_{1}=T$ , then the probability becomes 1. Observe that

S(\star,\star,\dots,\star)=\frac{1}{2}S(F,\star,\dots,\star)+\frac{1}{2}S(T,% \star,\dots,\star).

Consequently, it must hold that

\max\{S(F,\star,\dots,\star),S(T,\star,\dots,\star)\}\geq S(\star,\star,\dots,% \star).

As we have just argued, it’s possible to compute both these quantities and figure out which is larger. We can then set $x_{1}$ to the corresponding value and keep assigning truth values recursively. Since the value of $S$ never goes down and it starts at $7m/8$ , when the algorithm finishes we must satisfy at least 7m/8 fraction of clauses. Note that the algorithm may indeed satisfy more than $7m/8$ fraction of clauses.

1.4 Choosing the Right Distribution

Here is a more complicated example in which the choice of distribution requires a preliminary lemma. Let $V=V_{1}\cup\dots\cup V_{k}$ , where the $V_{i}$ ’s are disjoint sets, each of size $n$ . Let $h:{V\choose k}\to\{\pm 1\}$ be a two-coloring of the $k$ -sets. A $k$ -set E is crossing if it contains precisely one point from each $V_{i}$ . For $S\subset V$ set

h(S):=\sum_{E\in{V\choose k}}h(E).

(1.1)

Theorem 1.5.

Suppose $h(E)=+1$ for all crossing $k$ -sets $E$ . Then there is an $S\subset V$ for which

|h(S)|\geq c_{k}n^{k}

Here $c_{k}$ is a positive constant, which is independent of $n$ .

Perhaps, the first attempt is to choose each element of $V$ in $S$ , independently, with probability $1/2$ . It turns $\mathbb{E}\left[h(S)\right]$ for such a distribution can be even negative, e.g., assume $h(E)=-1$ for every non-crossing $k$ -set. If you think about it deeply, you would wonder why $1/2$ ? As we will see, choosing elements of $S$ independently is right, but we need to be careful on the marginals; we want to choose the marginals based on the function $h(.)$ given to us.

But how? Let $p_{1},\dots,p_{k}$ be the marginals of elements of $V_{1},\dots,V_{k}$ to be determined, i.e., we sample elements in $S$ independently but elements from the same $V_{i}$ are chosen with the same marignals. Given $p_{1},\dots,p_{k}$ , we define a random set $R$ where for every element $x\in V_{i}$ , we add $x$ to $S$ with probability $p_{i}$ , independent of every other element.

Define a random variable

X_{R}:=h(R).

(1.2)

It turns out that we can write $\mathbb{E}\left[X_{R}\right]$ as a $k$ -homogeneous polynomial in $p_{1},\dots,p_{k}$ :

	$\displaystyle\mathbb{E}\left[X_{R}\right]$	$\displaystyle\underset{\eqref{def:XR}}{=}\sum_{S}\mathbb{P}\left[R=S\right]h(S)$
		$\displaystyle\underset{\eqref{def:h}}{=}\sum_{E\in{V\choose k}}\mathbb{P}\left% [E\subseteq R\right]\cdot h(E)$
		$\displaystyle=\sum_{\begin{subarray}{c}a_{1},\dots,a_{k}\in\mathbb{N}^{k}\\ \sum_{i}a_{i}=k\end{subarray}}\sum_{E:\|E\cap V_{i}\|=a_{i}}\prod_{i=1}^{k}p_{i}% ^{a_{i}}\cdot h(E)$
		$\displaystyle=\sum_{\begin{subarray}{c}a_{1},\dots,a_{k}\in\mathbb{N}^{k}\\ \sum_{i}a_{i}=k\end{subarray}}\prod_{i=1}^{k}p_{i}^{a_{i}}\underbrace{\left(% \sum_{E:\|E\cap V_{i}\|=a_{i},\forall i}h(E)\right)}_{=:c_{a_{1},\dots,a_{k}}}=q% (p_{1},\dots,p_{k}).$

In the third equality, we classify all $k$ -sets by their ”type” namely the size of their intersections with $V_{1},\dots,V_{k}$ .

To prove the theorem, we need to show that there is a choice of $R$ such that $|h(R)|\geq c_{k}n^{k}$ . By the probabilistic method it is enough to show that $|\mathbb{E}\left[X_{R}\right]|\geq c_{k}n^{k}$ . By the above equation it is enough to show that there is a choice of $p_{1},\dots,p_{k}$ such that $|q(p_{1},\dots,p_{k})|\geq c_{k}n^{k}$ . That is what we show in the rest of the proof.

In the above equations, we get the multivariate polynomial $q$ in terms of $p_{1},\dots,p_{k}$ . The following properties of $q$ are immediate:

•

$q$ is $k$ -homogeneous; i.e., every monomial of $q$ has degree $k$ .
•

Since $h(E)=+1$ for all crossing sets, we have $c_{1,\dots,1}=|V_{1}\times\dots\times V_{k}|=n^{k}$ .
•

For every $a_{1},\dots,a_{k}\in[k]^{k}$ with $\sum_{i}a_{i}=k$ , we have

$c_{a_{1},\dots,a_{k}}\underset{h(E)=\pm 1,\forall E}{\leq}\sum_{E:|E\cap V_{i}% |=a_{i},\forall i}+1\leq n^{k}$

Finally, by the following fact, there exists a choice of $p_{1},\dots,p_{k}$ such that $|q(p_{1},\dots,p_{k})/n^{k}|\geq c_{k}$ as desired.

Fact 1.6.

Let $P_{k}$ denote the set of all $k$ -homogeneous polynomials $f\in\mathbb{R}[p_{1},\dots,p_{k}]_{k}$ such that all coefficients of $f(.)$ have absolute value at most one and the monomial $p_{1}\dots p_{k}$ have coefficient exactly one. Then, for all $f\in P_{k}$

\max_{p_{1},\dots,p_{k}\in[0,1]}|f(p_{1},\dots,p_{k})|\geq c_{k}

where $c_{k}>0$ is an absolute constant only as a function of $k$ .

Proof.

Set

M(f):=\max_{p_{1},\dots,p_{k}\in[0,1]}|f(p_{1},\dots,p_{k})|.

The main observation is that for any $f\in P_{k}$ , $M(f)>0$ . This is simply because $f$ is not the identically zero polynomial (it has one non-zero monomial. So over a field of size at least the degree of $f$ , it cannot evaluate to zero. Lastly, we observe that $P_{k}$ is compact and $M:P_{k}\to\mathbb{R}$ is a continuous map. So $M$ must have a minimum value that is non-zero, i.e., $\min_{f\in P_{k}}M(f)\geq c_{k}$ . ∎

1.5 The Alteration Method

Sometimes in our probabilistic method proof, we may not directly obtain the object of interest. Instead, we may try to sample a “good enough” object and then show that by a small number tweaks we can turn the object into a feasible object.

Recall that $R(k)$ is the smallest integer $n$ such that in every two coloring of the edges of the complete graph $K_{n}$ by red and blue there is a monochromatic copy of $K_{k}$ . The following is a stronger variant of 1.1

Theorem 1.7.

For any integer $n$ and $k$ , we have $R(k)>n-{n\choose k}2^{1-{k\choose 2}}$ .

Proof.

As in 1.1, consider a uniformly random 2-coloring of the edges of $K_{n}$ . Let $E_{H}$ be the even that the subgraph on $H$ is monochromatic. Let

X=\sum_{H\in{n\choose k}}E_{H},

be the number of monochromatic copies of $K_{k}$ in our two-colored graph. By linearity of expectations,

\mathbb{E}\left[X\right]=\sum_{H\in{n\choose k}}\mathbb{E}\left[E_{H}\right]=2% ^{1-{k\choose 2}}\cdot{n\choose k}

Now, it follows that there must exists a two-coloring such that the number of monochromatic copies of $K_{k}$ is at most $\mathbb{E}\left[X\right]$ . Consider such a coloring.

Now, we discuss the alteration part: We know that we have (at most) $\lfloor\mathbb{E}\left[X\right]\rfloor$ copies of $K_{k}$ . We are going to delete one (arbitrary) vertex from each of these copies. Note that in principal these copies may share vertices so we may be able to delete all of them by removing a few vertices, but in the worst case, these copies are disjoint. So, we can delete all of them by removing at most $\lfloor\mathbb{E}\left[X\right]\rfloor$ vertices of $G$ . The resulting graph has at least $n-{n\choose k}2^{1-{k\choose 2}}$ vertices and has no copies of $K_{k}$ . ∎

Now, we are left with the ”calculus” problem of for what values of $n$ , can we optimize the inequality. It turns out with a bit of calucluations that

R(k)>\frac{k}{e}2^{k/2}.

This is slightly better than what we can show with 1.1, that $R(k)>\frac{k}{e\sqrt{2}}2^{k/2}$ .

In future lectures, we will see how to use a more sophisticated technique, called the Lovasz Local lemma, to get a slightly better bound.