CSE 525: Randomized Algorithms Spring 2025 Lecture 7: Negative Correlation and Applications Lecturer: Shayan Oveis Gharan 04/29/2025

Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

7.1 Positive Association

Theorem 7.1 (The Four Functions Theorem).

Let α,β,γ,δ:2[n]0 be non-negative functions defined on subsets of [n]. If for any two subsets A,B[n] we have

α(A)β(B)γ(AB)δ(AB),

then, for every two families of subsets 𝒜,2[n], we have

α(𝒜)β()γ(𝒜)δ(𝒜),

where α(𝒜)=A𝒜α(A).

The following theorem, known as the FKG inequality, is a direct consequence of the above theorem:

Definition 7.2 (Log-supermodular probability distributions).

We say a probability distribution μ:2[n]0 is log-supermodular if for any A,B[n], we have

μ(A)μ(B)μ(AB)μ(AB).

This property is also known as Positive Lattice Condition.

For a concrete example, consider the family of Erdos-Reyni G(n,p) random graphs. In such a case, for any set F(n2) we have

μ(F)=p|F|(1p)(n2)|F|=(1p)(n2)(p1p)|F|.

We claim that this distribution is log-supermodular. Cancelling out the normalizing constant (1p)(n2), we need to check for any two sets A,BE, and q=p1p

q|A|q|B|q|AB|q|AB|

But this holds simply because |A|+|B|=|AB|+|AB|.

Definition 7.3 (Increasing functions).

We say a function f:2[n]geq0 is increasing if for any A,B[n] such that AB we have

f(A)f(B).

We say f is a decreasing function if the above inequality holds in the reverse direction.

For a concrete example, notice for any i[n], f(A)=𝕀[iA] is increasing and f(A)=𝕀[iA] is decreasing.

But, more generally, consider the domain (n2) of the set of all possible edges in a graph with n vertices. Then, for A(n2),

f(A)=𝕀[A is connected],f(A)=𝕀[A has a Hamiltonian cycle]

are increasing, but,

f(A)=𝕀[G(V,A) is 3-colorable]

is decreasing.

Theorem 7.4 (FKG Inequality).

Let μ:2[n]0 be a log-supermodular probability distribution. Then, for any two increasing functions f,g:2[n]0 we have

[f][g][fg],

i.e., μ is positively associated.

Proof.

We use the Four functions theorem with

α=μf,β=μg,γ=μfg, and δ=μ.

We claim that these four functions satisfy the assumption of the Four functions theorem. In particular, for any A,B[n] by log-supermodularity of μ we have

α(A)β(B) =μ(A)f(A)μ(B)g(B)log-supermodularityμ(AB)f(A)g(B)μ(AB)
f,gare increasingμ(AB)f(AB)g(AB)μ(AB)=γ(AB)δ(AB).

Therefore, letting 𝒜,=[n], we conclude

α([n])β([n])=(A[n]μ(A)f(A))(B[n]μ(B)g(B))=𝔼[f]𝔼[g]

On the other hand,

γ([n])δ([n])=(A[n]μ(A)f(A)g(A))(B[n]μ(B))=𝔼[fg]1

Putting them together proves the theorem. ∎

Note that the above inequality also holds if both f,g are decreasing functions. In case that f is increasing and g is decreasing, the inequality holds just in the opposite direction.

Consequently, FKG theorem implies that any pair of elements i,j are positively correlated in a log-supermodular probability distribution,

[i][j][i,j][i|j][i]

More interestingly, we can use it to prove the following fact about G(n,p) graphs:

Fact 7.5.

For any 0p1, let G be a random Erdos-Reyni graph with parameter p.

[G has a Hamiltonian cycle|G is 3-colorable][G has a Hamiltonian cycle].

7.2 Negatively Correlated Random Variables

We say that a collection {X1,,Xn} of random variables are negatively correlated if it holds that for any subset S[n]:

𝔼[iSXi]iS𝔼[Xi].

Note that if {X1,,Xn} are independent, then this holds with equality.

Furthermore, we say X1,,Xn are pairwise negatively correlated if for all 1i<jn,

𝔼[XiXj]𝔼[Xi]𝔼[Xj].
Theorem 7.6 (Chernoff for negatively correlated random variables).

. Suppose X1,,Xn are negatively correlated Bernoulli random variables (instead of independent), then the conclusion of the multiplicative Chernoff bound still holds.

Proof.

To see this, note that the one place we used independence in the proof of the Chernoff bound is in the calculation: When X=X1++Xn,

𝔼[etX]=𝔼[etiXi]=𝔼[i=1netXi]=i=1n𝔼[etXi].

The main observation is that the above statement still holds except the last identity will be an inequality. So the rest of the proof of the Chernoff bound follows. In particular, when X1,,Xn are negatively correlated we show

𝔼[etX]i=1n𝔼[etXi].

Let {X~1,,X~n} be independent Bernoulli random variables with 𝔼[X~i]=𝔼[Xi] for each i{1,,n} and define X~:=X~1++X~n. For any nonnegative integer k,

𝔼[Xk] =α𝔼[X1α1X2α2Xnαn]
=Xi{0,1}α𝔼[i=1nXi𝕀[αi1mod2]]
Negative Correlationαi=1n𝔼[Xi𝕀[αi1mod2]]
=αi=1n𝔼[Xiαi]=𝔼[Xi]=𝔼[X~i]αi=1n𝔼[X~iαi]

where the sum is over all non-negative integer vectors α such that iαi=k.

On the other hand, since X~1,,X~n are independent,

αi=1n𝔼[X~iαi]=α𝔼[X~1α1X~nαn]=𝔼[X~].

Putting these together we obtain, for every k0,

𝔼[Xk]𝔼[X~k] (7.1)

Lastly, using the Taylor expansion

etX=1+tX+t2X22+t3X36+

Applying (7.1) to every monomial above, we get

𝔼[etX]𝔼[etX~]=independencei=1n𝔼[etX~i]=i=1n𝔼[etXi]

as desired. ∎

Definition 7.7 (Generating Polynomial).

It is natural to express a probability distribution μ over subsets of [n] by its generating polynomial. To do that we consider n variables, z1,,zn and write

gμ(z1,,zn)=S[n]μ(S)zS,

where zS=iSzi.

For a concrete example, let B1,,Bn be n independent Bernoulli random variables where Bi has success probability pi. Then, we can write the corresponding generating polynomial as follows:

(p1z1+1p1)(p2z2+1p2)(pnzn+1pn).

The following facts about the generating polynomial are straightforward:

Fact 7.8.

Let μ be a probability distribution over [n] with generating polynomial gμ, then

  • gμ(𝟏)=1, i.e., sum of the coefficients of gμ is 1.

  • igμ(𝟏)=μ[i], i.e., the marginals can be deduced by take partial derivatives.

  • i,j are negatively correlated if

    g(𝟏)ijgμ(𝟏)=[i,j][i][j]=igμ(𝟏)jgμ(𝟏).
  • Say we have two probability distributions μ1,μ2 over disjoint sets, then the product distribution is the probability distribution with generating polynomial μ1μ2=gμ1gμ2.

  • If μ1,μ2 are pairwise negatively correlated then so is μ1μ2.

Next, we explain a few examples of negatively correlated random variables:

Example 1: Observe that any probability distribuition over subsets of size (exactly) one among n objects is negatively correlated, namely

p1z1+p2z2++pnzn

where ipi=1. Following the above fact, product of these distributions are also negatively correlated.

As an application, recall that in lecture 4, we introduced a probability distribution over paths P𝒫i connecting the i-th terminal pairs si,ti where we chose one path with its probability yP and we independently run the procedure for every i. It follows that the resulting probability distribution over the random variables YP=𝕀[P is chosen] is negatively correlated. So, we could have directly apply the Chernoff bound instead of defining a new family of random variables Xe,i=𝕀[a path of 𝒫i going through e is chosen].

Example 2: Edges of a uniform spanning tree One of the most interesting family of negative correlated probability distributions is the distribution of the set of edges of a uniform spanning tree. Namely, let G=(V,E) be a connected undirected graph; assign a variable ze to every edge eE, then μ is the distribution with the following generating polynomial,

gμ({ze}eE)=T spanning treezT.

We will discuss ideas to prove this fact in the next lecture.

7.3 Towards a theory of Negative dependence

One of the ongoing research directions in probability theory is to study under what conditions one can expect negative correlation and and negative association.

Following the above discussion, a natural choice is the reverse of the positive lattice condition, namely negative lattice condition:

μ(A)μ(B)μ(AB)μ(AB),A,B[n]

Unfortunately, it can be seen that this property does not even imply a pairwise negative correlation property:

Example 7.9.

Consider the distribution μ over [4] with the following generating polynomial,

12(z1z2+z3z4).

This distribution satisfies the NLC but it not negative correlated as [1,2]=0.5>0.25=[1][2].

In the next lecture we will introduce strongly Rayleigh distribution as a generic method to study negative dependence.