7.1 Positive Association

CSE 525: Randomized Algorithms Spring 2025 Lecture 7: Negative Correlation and Applications Lecturer: Shayan Oveis Gharan 04/29/2025

Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

7.1 Positive Association

Theorem 7.1 (The Four Functions Theorem).

Let $\alpha,\beta,\gamma,\delta:2^{[n]}\to\mathbb{R}_{\geq 0}$ be non-negative functions defined on subsets of $[n]$ . If for any two subsets $A,B\subseteq[n]$ we have

\alpha(A)\beta(B)\leq\gamma(A\cup B)\delta(A\cap B),

then, for every two families of subsets ${\cal{A}},{\cal B}\subseteq 2^{[n]}$ , we have

\alpha({\cal{A}})\beta({\cal B})\leq\gamma({\cal{A}}\cup{\cal B})\delta({\cal{% A}}\cap{\cal B}),

where $\alpha({\cal{A}})=\sum_{A\in{\cal{A}}}\alpha(A)$ .

The following theorem, known as the FKG inequality, is a direct consequence of the above theorem:

Definition 7.2 (Log-supermodular probability distributions).

We say a probability distribution $\mu:2^{[n]}\to\mathbb{R}_{\geq 0}$ is log-supermodular if for any $A,B\subseteq[n]$ , we have

\mu(A)\mu(B)\leq\mu(A\cup B)\mu(A\cap B).

This property is also known as Positive Lattice Condition.

For a concrete example, consider the family of Erdos-Reyni $G(n,p)$ random graphs. In such a case, for any set $F\subseteq{n\choose 2}$ we have

\mu(F)=p^{|F|}(1-p)^{{n\choose 2}-|F|}=(1-p)^{{n\choose 2}}(\frac{p}{1-p})^{|F% |}.

We claim that this distribution is log-supermodular. Cancelling out the normalizing constant $(1-p)^{{n\choose 2}}$ , we need to check for any two sets $A,B\subseteq E$ , and $q=\frac{p}{1-p}$

q^{|A|}q^{|B|}\leq q^{|A\cup B|}q^{|A\cap B|}

But this holds simply because $|A|+|B|=|A\cap B|+|A\cup B|$ .

Definition 7.3 (Increasing functions).

We say a function $f:2^{[n]}\to\mathbb{R}_{geq0}$ is increasing if for any $A,B\subseteq[n]$ such that $A\subseteq B$ we have

f(A)\leq f(B).

We say $f$ is a decreasing function if the above inequality holds in the reverse direction.

For a concrete example, notice for any $i\in[n]$ , $f(A)=\mathbb{I}\left[i\in A\right]$ is increasing and $f(A)=\mathbb{I}\left[i\notin A\right]$ is decreasing.

But, more generally, consider the domain ${n\choose 2}$ of the set of all possible edges in a graph with $n$ vertices. Then, for $A\subseteq{n\choose 2}$ ,

f(A)=\mathbb{I}\left[A\text{ is connected}\right],f(A)=\mathbb{I}\left[A\text{% has a Hamiltonian cycle}\right]

are increasing, but,

f(A)=\mathbb{I}\left[G(V,A)\text{ is 3-colorable}\right]

is decreasing.

Theorem 7.4 (FKG Inequality).

Let $\mu:2^{[n]}\to\mathbb{R}_{\geq 0}$ be a log-supermodular probability distribution. Then, for any two increasing functions $f,g:2^{[n]}\to\mathbb{R}_{\geq 0}$ we have

\mathbb{P}\left[f\right]\mathbb{P}\left[g\right]\leq\mathbb{P}\left[fg\right],

i.e., $\mu$ is positively associated.

Proof.

We use the Four functions theorem with

\alpha=\mu\cdot f,\beta=\mu\cdot g,\gamma=\mu\cdot f\cdot g,\text{ and }\delta% =\mu.

We claim that these four functions satisfy the assumption of the Four functions theorem. In particular, for any $A,B\subseteq[n]$ by log-supermodularity of $\mu$ we have

	$\displaystyle\alpha(A)\beta(B)$	$\displaystyle=\mu(A)f(A)\mu(B)g(B)\underset{\text{log-supermodularity}}{\leq}% \mu(A\cup B)f(A)g(B)\mu(A\cap B)$
		$\displaystyle\underset{f,g\text{are increasing}}{\leq}\mu(A\cup B)f(A\cup B)g(% A\cup B)\mu(A\cap B)=\gamma(A\cup B)\delta(A\cap B).$

Therefore, letting ${\cal{A}},{\cal B}=[n]$ , we conclude

\alpha([n])\beta([n])=\left(\sum_{A\subset[n]}\mu(A)f(A)\right)\cdot\left(\sum% _{B\subseteq[n]}\mu(B)g(B)\right)=\mathbb{E}\left[f\right]\mathbb{E}\left[g\right]

On the other hand,

\gamma([n])\delta([n])=\left(\sum_{A\subseteq[n]}\mu(A)f(A)g(A)\right)\left(% \sum_{B\subseteq[n]}\mu(B)\right)=\mathbb{E}\left[fg\right]\cdot 1

Putting them together proves the theorem. ∎

Note that the above inequality also holds if both $f, g$ are decreasing functions. In case that $f$ is increasing and $g$ is decreasing, the inequality holds just in the opposite direction.

Consequently, FKG theorem implies that any pair of elements $i, j$ are positively correlated in a log-supermodular probability distribution,

\mathbb{P}\left[i\right]\mathbb{P}\left[j\right]\leq\mathbb{P}\left[i,j\right]% \Leftrightarrow\mathbb{P}\left[i|j\right]\geq\mathbb{P}\left[i\right]

More interestingly, we can use it to prove the following fact about $G(n,p)$ graphs:

Fact 7.5.

For any $0\leq p\leq 1$ , let $G$ be a random Erdos-Reyni graph with parameter $p$ .

\mathbb{P}\left[G\text{ has a Hamiltonian cycle}|G\text{ is 3-colorable}\right% ]\leq\mathbb{P}\left[G\text{ has a Hamiltonian cycle}\right].

7.2 Negatively Correlated Random Variables

We say that a collection $\{X_{1},\dots,X_{n}\}$ of random variables are negatively correlated if it holds that for any subset $S\subseteq[n]$ :

\mathbb{E}\left[\prod_{i\in S}X_{i}\right]\leq\prod_{i\in S}\mathbb{E}\left[X_% {i}\right].

Note that if $\{X_{1},\dots,X_{n}\}$ are independent, then this holds with equality.

Furthermore, we say $X_{1},\dots,X_{n}$ are pairwise negatively correlated if for all $1\leq i<j\leq n$ ,

\mathbb{E}\left[X_{i}X_{j}\right]\leq\mathbb{E}\left[X_{i}\right]\mathbb{E}% \left[X_{j}\right].

Theorem 7.6 (Chernoff for negatively correlated random variables).

. Suppose $X_{1},\dots,X_{n}$ are negatively correlated Bernoulli random variables (instead of independent), then the conclusion of the multiplicative Chernoff bound still holds.

Proof.

To see this, note that the one place we used independence in the proof of the Chernoff bound is in the calculation: When $X=X_{1}+\dots+X_{n}$ ,

\mathbb{E}\left[e^{tX}\right]=\mathbb{E}\left[e^{t\sum_{i}X_{i}}\right]=% \mathbb{E}\left[\prod_{i=1}^{n}e^{tX_{i}}\right]=\prod_{i=1}^{n}\mathbb{E}% \left[e^{tX_{i}}\right].

The main observation is that the above statement still holds except the last identity will be an inequality. So the rest of the proof of the Chernoff bound follows. In particular, when $X_{1},\dots,X_{n}$ are negatively correlated we show

\mathbb{E}\left[e^{tX}\right]\leq\prod_{i=1}^{n}\mathbb{E}\left[e^{tX_{i}}% \right].

Let $\{\tilde{X}_{1},\dots,\tilde{X}_{n}\}$ be independent Bernoulli random variables with $\mathbb{E}\left[\tilde{X}_{i}\right]=\mathbb{E}\left[X_{i}\right]$ for each $i\in\{1,\dots,n\}$ and define $\tilde{X}:=\tilde{X}_{1}+\dots+\tilde{X}_{n}$ . For any nonnegative integer $k$ ,

	$\displaystyle\mathbb{E}\left[X^{k}\right]$	$\displaystyle=\sum_{\alpha}\mathbb{E}\left[X_{1}^{\alpha_{1}}X_{2}^{\alpha_{2}% }\dots X_{n}^{\alpha_{n}}\right]$
		$\displaystyle\underset{X_{i}\in\{0,1\}}{=}\sum_{\alpha}\mathbb{E}\left[\prod_{% i=1}^{n}X_{i}^{\mathbb{I}\left[\alpha_{i}\equiv 1\bmod 2\right]}\right]$
		$\displaystyle\underset{\text{Negative Correlation}}{\leq}\sum_{\alpha}\prod_{i% =1}^{n}\mathbb{E}\left[X_{i}^{\mathbb{I}\left[\alpha_{i}\equiv 1\bmod 2\right]% }\right]$
		$\displaystyle=\sum_{\alpha}\prod_{i=1}^{n}\mathbb{E}\left[X_{i}^{\alpha_{i}}% \right]\underset{\mathbb{E}\left[X_{i}\right]=\mathbb{E}\left[\tilde{X}_{i}% \right]}{=}\sum_{\alpha}\prod_{i=1}^{n}\mathbb{E}\left[\tilde{X}_{i}^{\alpha_{% i}}\right]$

where the sum is over all non-negative integer vectors $\alpha$ such that $\sum_{i}\alpha_{i}=k$ .

On the other hand, since $\tilde{X}_{1},\dots,\tilde{X}_{n}$ are independent,

\sum_{\alpha}\prod_{i=1}^{n}\mathbb{E}\left[\tilde{X}_{i}^{\alpha_{i}}\right]=% \sum_{\alpha}\mathbb{E}\left[\tilde{X}_{1}^{\alpha_{1}}\dots\tilde{X}_{n}^{% \alpha_{n}}\right]=\mathbb{E}\left[\tilde{X}\right].

Putting these together we obtain, for every $k\geq 0$ ,

\mathbb{E}\left[X^{k}\right]\leq\mathbb{E}\left[\tilde{X}^{k}\right]

(7.1)

Lastly, using the Taylor expansion

e^{tX}=1+tX+\frac{t^{2}X^{2}}{2}+\frac{t^{3}X^{3}}{6}+\dots

Applying (7.1) to every monomial above, we get

\mathbb{E}\left[e^{tX}\right]\leq\mathbb{E}\left[e^{t\tilde{X}}\right]% \underset{\text{independence}}{=}\prod_{i=1}^{n}\mathbb{E}\left[e^{t\tilde{X}_% {i}}\right]=\prod_{i=1}^{n}\mathbb{E}\left[e^{tX_{i}}\right]

as desired. ∎

Definition 7.7 (Generating Polynomial).

It is natural to express a probability distribution $\mu$ over subsets of $[n]$ by its generating polynomial. To do that we consider $n$ variables, $z_{1},\dots,z_{n}$ and write

g_{\mu}(z_{1},\dots,z_{n})=\sum_{S\subseteq[n]}\mu(S)z^{S},

where $z^{S}=\prod_{i\in S}z_{i}$ .

For a concrete example, let $B_{1},\dots,B_{n}$ be $n$ independent Bernoulli random variables where $B_{i}$ has success probability $p_{i}$ . Then, we can write the corresponding generating polynomial as follows:

(p_{1}z_{1}+1-p_{1})(p_{2}z_{2}+1-p_{2})\dots(p_{n}z_{n}+1-p_{n}).

The following facts about the generating polynomial are straightforward:

Fact 7.8.

Let $\mu$ be a probability distribution over $[n]$ with generating polynomial $g_{\mu}$ , then

•

$g_{\mu}({\bf 1})=1$ , i.e., sum of the coefficients of $g_{\mu}$ is 1.
•

$\partial_{i}g_{\mu}({\bf 1})=\mathbb{P}_{\mu}\left[i\right]$ , i.e., the marginals can be deduced by take partial derivatives.
•

$i, j$ are negatively correlated if

$g({\bf 1})\partial_{i}\partial_{j}g_{\mu}({\bf 1})=\mathbb{P}\left[i,j\right]% \geq\mathbb{P}\left[i\right]\mathbb{P}\left[j\right]=\partial_{i}g_{\mu}({\bf 1% })\partial_{j}g_{\mu}({\bf 1}).$
•

Say we have two probability distributions $\mu_{1},\mu_{2}$ over disjoint sets, then the product distribution is the probability distribution with generating polynomial $\mu_{1}\cdot\mu_{2}=g_{\mu_{1}}g_{\mu_{2}}$ .
•

If $\mu_{1},\mu_{2}$ are pairwise negatively correlated then so is $\mu_{1}\cdot\mu_{2}$ .

Next, we explain a few examples of negatively correlated random variables:

Example 1: Observe that any probability distribuition over subsets of size (exactly) one among $n$ objects is negatively correlated, namely

p_{1}z_{1}+p_{2}z_{2}+\dots+p_{n}z_{n}

where $\sum_{i}p_{i}=1$ . Following the above fact, product of these distributions are also negatively correlated.

As an application, recall that in lecture 4, we introduced a probability distribution over paths $P\in{\cal P}_{i}$ connecting the $i$ -th terminal pairs $s_{i},t_{i}$ where we chose one path with its probability $y_{P}$ and we independently run the procedure for every $i$ . It follows that the resulting probability distribution over the random variables $Y_{P}=\mathbb{I}\left[P\text{ is chosen}\right]$ is negatively correlated. So, we could have directly apply the Chernoff bound instead of defining a new family of random variables $X_{e,i}=\mathbb{I}\left[\text{a path of ${\cal P}_{i}$ going through $e$ is % chosen}\right]$ .

Example 2: Edges of a uniform spanning tree One of the most interesting family of negative correlated probability distributions is the distribution of the set of edges of a uniform spanning tree. Namely, let $G=(V,E)$ be a connected undirected graph; assign a variable $z_{e}$ to every edge $e\in E$ , then $\mu$ is the distribution with the following generating polynomial,

g_{\mu}(\{z_{e}\}_{e\in E})=\sum_{T\text{ spanning tree}}z^{T}.

We will discuss ideas to prove this fact in the next lecture.

7.3 Towards a theory of Negative dependence

One of the ongoing research directions in probability theory is to study under what conditions one can expect negative correlation and and negative association.

Following the above discussion, a natural choice is the reverse of the positive lattice condition, namely negative lattice condition:

\mu(A)\mu(B)\geq\mu(A\cup B)\mu(A\cap B),\quad\forall A,B\subseteq[n]

Unfortunately, it can be seen that this property does not even imply a pairwise negative correlation property:

Example 7.9.

Consider the distribution $\mu$ over $[4]$ with the following generating polynomial,

\frac{1}{2}(z_{1}z_{2}+z_{3}z_{4}).

This distribution satisfies the NLC but it not negative correlated as $\mathbb{P}\left[1,2\right]=0.5>0.25=\mathbb{P}\left[1\right]\mathbb{P}\left[2\right]$ .

In the next lecture we will introduce strongly Rayleigh distribution as a generic method to study negative dependence.