CSE 312 – Reference Final Exam Cheat Sheet

Spring 2026

Combinatorial Theory

The Sum Rule

If an experiment can either end up being one of \(N\) outcomes, or one of \(M\) outcomes (where there is no overlap), then the total number of possible outcomes is: \(N + M\).

The Product Rule

If an experiment has \(N_{1}\) outcomes for the first stage, \(N_{2}\) outcomes for the second stage, \(\dots \), and \(N_{m}\) outcomes for the \(m^{\text {th}}\) stage, then the total number of outcomes of the experiment is \(N_{1} \times N_{2} \cdot \dots \cdot N_{m}=\prod _{i=1}^m N_i\).

Complementary Counting

Let \(\mathcal {U}\) be a (finite) universal set, and \(S\) a subset of interest. Then, \(\mid S \mid = \mid \mathcal {U} \mid - \mid \mathcal {U} \setminus S \mid \).

Permutation

The number of orderings of \(N\) distinct objects is \(N! = N \cdot (N - 1) \cdot (N - 2) \cdot \dots 3 \cdot 2 \cdot 1\).

\(k\)-Permutations

If we want to pick (order matters) only \(k\) out of \(n\) distinct objects, the number of ways to do so is: \begin {equation*} P(n, k) = n \cdot (n - 1) \cdot (n - 2) \cdot ... \cdot (n - k + 1) = \frac {n!}{(n - k)!} \end {equation*}

\(k\)-Combinations/Binomial Coefficients

If we want to choose (order doesn’t matter) only \(k\) out of \(n\) distinct objects, the number of ways to do so is: \begin {equation*} C(n, k) = \binom {n}{k} = \frac {P(n, k)}{k!} = \frac {n!}{k!(n-k)!} \end {equation*}

Multinomial Coefficients

If we have \(k\) distinct types of objects (\(n\) total), with \(n_1\) of the first type, \(n_2\) of the second, ..., and \(n_k\) of the \(k\)-th, then the number of arrangements possible is \begin {equation*} \binom {n}{n_1, n_2, ..., n_k} = \frac {n!}{n_1!n_2!...n_k!} \end {equation*}

Encoding/Stars and Bars Method

The number of ways to distribute \(n\) indistinguishable balls into \(k\) distinguishable bins is \begin {equation*} \binom {n + k - 1}{k - 1} = \binom {n + k - 1}{n} \end {equation*}

Binomial Theorem

Let \(x, y \in \mathbb {R}\) and \(n \in \mathbb {N}\) a positive integer. Then: \((x + y)^{n} = \sum _{k = 0}^{n}\binom {n}{k}x^{k}y^{n-k}\).

Principle of Inclusion-Exclusion (PIE)

2 events: \(| A \cup B| = | A | + | B| - | A \cap B |\)

3 events: \(| A \cup B\cup C| = | A | + | B| +|C|- | A \cap B |-|A\cap C|-|B\cap C|+|A\cap B\cap C|\)

\(k\) events: singles - doubles + triples - quads + ...

Further, in general, if \(A_{1}, A_{2}, \dots , A_{n}\) are sets, then: \begin {equation*} \begin {aligned} |A_{1} \cup \dots \cup A_{n}| &= \text {singles} - \text {doubles} + \text {triples} - \text {quads} + \cdots \end {aligned} \end {equation*} Where singles are the size of all the single sets, doubles are the size of all the intersections of two sets, triples are the size of all the intersections of three sets, quads are all the intersection of four sets, and so forth.

Pigeonhole Principle

If there are \(n\) pigeons we want to put into \(k\) holes (where \(n > k\)), then at least one pigeonhole must contain at least 2 (or to be precise, \(\lceil n / k \rceil \)) pigeons.

Combinatorial Proofs

To prove two quantities are equal, you can come up with a combinatorial situation, and show that both in fact count the same thing, and hence must be equal.

Discrete Probability

Key Probability Definitions

The sample space is the set \(\Omega \) of all possible outcomes of an experiment. An event is any subset \(E \subseteq \Omega \). Events \(E\) and \(F\) are mutually exclusive if \(E \cap F = \emptyset \).

Probability space

A probability space is a pair \((\Omega , \mathbb {P})\), where \(\Omega \) is the sample space and \(\mathbb {P}: \Omega \to [0,1]\) is a probability measure such that \(\sum _{x \in \Omega } \Prob {x} = 1\). The probability of an event \(E \subseteq \Omega \) is \(\Prob {E} = \sum _{x \in E} \Prob {x}\).

Equally Likely Outcomes

If \(\Omega \) is a sample space such that each of the unique outcome elements in \(\Omega \) are equally likely, then for any event \(E \subseteq \Omega \): \(\mathbb {P}(E) = {|E|}/{|\Omega |}\).

Conditional Probability

\begin {equation*} \Prob { A \mid B } = \dfrac {\Prob { A \cap B }}{\Prob { B }} \end {equation*}

Bayes Theorem

\begin {equation*} \Prob { A \mid B } = \dfrac {\Prob { B \mid A }\Prob { A }}{\Prob { B }} \end {equation*}

Partition

Non-empty events \(E_{1}, \dots , E_{n}\) partition the sample space \(\Omega \) if they are both:

(Exhaustive) \(E_{1} \cup E_{2} \cup \dots \cup E_{n} = \bigcup _{i = 1}^{n}E_{i} = \Omega \) (they cover the entire sample space).
(Pairwise Mutually Exclusive) For all \(i \neq j\), \(E_{i} \cap E_{j} = \emptyset \) (none of them overlap).

Note that for any event \(E\), \(E\) and \(E^{C}\) always form a partition of \(\Omega \).

Law of Total Probability (LTP)

If events \(E_{1}, \dots , E_{n}\) partition \(\Omega \), then for any event \(F\): \begin {equation*} \Prob { F } =\sum _{i = 1}^{n} \Prob { F \cap E_{i} }= \sum _{i = 1}^{n} \Prob { F \mid E_{i} } \Prob { E_{i} } \end {equation*}

Bayes Theorem with LTP

Let events \(E_{1}, \dots , E_{n}\) partition the sample space \(\Omega \), and let \(F\) be another event. Then: \begin {equation*} \begin {aligned} \Prob { E_{1} \mid F } &= \frac {\Prob { F \mid E_{1} }\Prob { E_{1} }}{\sum _{i = 1}^{n} \Prob { F \mid E_{i} }\Prob { E_{i} }} \end {aligned} \end {equation*}

Chain Rule

Let \(A_{1}, \dots , A_{n}\) be events with nonzero probabilities. Then: \begin {equation*} \begin {aligned} \Prob {A_{1}\cap \dots \cap A_{n}} = \Prob {A_{1}}\Prob {A_{2}\mid A_{1}}\Prob {A_{3}\mid A_{1} \cap A_{2}}\cdots \Prob {A_{n} \mid A_{1} \cap \cdots \cap A_{n-1}} \end {aligned} \end {equation*}

Independence

\(A\) and \(B\) are independent if any of the following equivalent statements hold:

a): \(\Prob {A\cap B} = \Prob {A}\,\Prob {B}\)
b): \(\Prob {A\mid B} = \Prob {A}\)
c): \(\Prob {B\mid A} = \Prob {B}\)

Mutual Independence

We say \(n\) events \(A_1,A_2,\dots ,A_n\) are (mutually) independent if, for any subset \(I\subseteq [n]=\{1,2,\dots ,n\}\), we have \begin {equation*} \Prob {\bigcap _{i\in I} A_i}=\prod _{i\in I}\Prob {A_i} \end {equation*} This equation is actually representing \(2^n\) equations since there are \(2^n\) subsets of \([n]\).

Conditional Independence

\(A\) and \(B\) are conditionally independent given an event \(C\) if any of the following equivalent statements hold:

a): \(\Prob {A\cap B \mid C} = \Prob {A \mid C}\,\Prob {B \mid C}\)
b): \(\Prob {A\mid B\cap C} = \Prob {A \mid C}\)
c): \(\Prob {B\mid A\cap C} = \Prob {B \mid C}\)

Random Variables

Random Variable (RV)

A random variable (RV) \(X\) is a numeric function of the outcome \(X:\Omega \to \mathbb {R}\). The set of possible values \(X\) can take on is its range/support, denoted \(\Omega _{X}\). If \(\Omega _{X}\) is finite or countable infinite (typically integers or a subset), \(X\) is a discrete RV.

Probability Mass Function (PMF)

For a discrete RV \(X\), assigns probabilities to values in its range. That is \(p_{X}: \Omega _{X} \rightarrow [0, 1]\) such that: (1) \(p_X(k) \geq 0\) for all \(k \in \Omega _X\); (2) \(\sum _{k \in \Omega _X} p_X(k) = 1\). Furthermore, \(p_{X}(k) = \Prob {X = k}\).

Probability Density Function (PDF)

The probability density function (PDF) of a continuous RV \(X\) is the function \(f_X:\mathbb {R} \rightarrow \mathbb {R}\), such that (1) \(f_X(z) \geq 0\) for all \(z \in \mathbb {R}\); (2) \(\int _{-\infty }^{\infty } f_X(t)\;dt = 1\).

Furthermore, \(\Prob {a\le X\le b}=\int _a^b f_X(w)\;\dif w\). The probability that \(X\) is close to \(q\) is proportional to its density \(f_X(q)\); \begin {equation*} \Prob {X\approx q}=\Prob {q - \frac {\varepsilon }{2} \leq X \leq q + \frac {\varepsilon }{2}}\approx \varepsilon f_X(q) \end {equation*} Ratios of probabilities of being “near points” are maintained; \begin {equation*} \frac {\Prob {X \approx u}}{\Prob {X \approx v}} \approx \frac {\varepsilon f_X(u)}{\varepsilon f_X(v)} = \frac {f_X(u)}{f_X(v)} \end {equation*}

Cumulative Distribution Function (CDF)

The cumulative distribution function (CDF) of ANY random variable (discrete or continuous) is defined to be the function \(F_X:\mathbb {R}\to \mathbb {R}\) with \(F_X(t)=\Prob {X\le t}\). If \(X\) is a continuous RV, \(F_X(t) = \Prob {X \leq t} = \int _{-\infty }^t f_X(w)\;dw\) for all \(t \in \mathbb {R}\). Further, \(\frac {d}{du} F_X(u) = f_X(u)\).

Expectation (Discrete)

The expectation of a discrete RV \(X\) is: \(\expect {X} = \sum _{k \in \Omega _{X}} k \cdot p_{X}(k)\).

Linearity of Expectation (LoE)

For any random variables \(X,Y\) (possibly dependent): \(\expect {aX+bY+c}=a\ \expect {X}+b\ \expect {Y}+c\)

Multiplicativity of expectation for independent random variables

For any independent random variables \(X,Y\): \(\expect {XY} = \expect {X} \cdot \expect {Y}. \)

Expectation (Continuous)

The expectation of a continuous RV \(X\) is: \(\expect {X}= \int _{- \infty }^{\infty }x\ f_{X}\left ( x \right ) \dif x\).

Law of the Unconscious Statistician (LOTUS)

For a RV \(X\) and function \(g\):

If \(X\) is discrete, \(\expect {g(X)}=\sum _{b\in \Omega _X}{g(b)\cdot p_X(b)}\).
If \(X\) is continuous, \(\expect {g(X)}= \int _{- \infty }^{\infty }g(x)f_{X}\left ( x \right ) \dif x\).

Linearity of Expectation with Indicators

If asked only about the expectation of a RV \(X\) which is some sort of “count” (and not its PMF), then you may be able to write \(X\) as the sum of possibly dependent indicator RVs \(X_1,\dots ,X_n\), and apply LoE, where for an indicator RV \(X_i\), \(\expect {X_i} = 1 \cdot \Prob {X_i = 1} + 0 \cdot \Prob {X_i = 0} = \Prob {X_i = 1}\).

Variance

\(\Var {X} = \expect {(X-\expect {X})^2} = \expect {X^2} - \expect {X}^2\).

Property of Variance

\(\Var {aX + b} = a^2\Var {X}\).

Standard Deviation (SD)

\(\sigma _X = \sqrt {\Var {X}}\).

Variance Adds for Independent RVs

If \(X, Y\) are independent, then \(\Var {X + Y} = \Var {X} + \Var {Y}\).

Independent and Identically Distributed (i.i.d.)

We say \(X_1,\dots ,X_n\) are said to be independent and identically distributed (i.i.d.) if all the \(X_i\)’s are independent of each other, and have the same distribution (PMF for discrete RVs, or CDF for continuous RVs).

Joint Distributions

Joint PMFs

The joint PMF of discrete RVs \(X\) and \(Y\) is: \begin {equation*} \begin {aligned} p_{X, Y}(a, b) &= \Prob {X = a, Y = b} \end {aligned} \end {equation*} Their joint range is \begin {equation*} \begin {aligned} \Omega _{X, Y} &= \{(c, d): p_{X, Y}(c, d) > 0\} \subseteq \Omega _{X} \times \Omega _{Y} \end {aligned} \end {equation*} Note that \(\sum _{(s, t) \in \Omega _{X, Y}}p_{X, Y}(s, t) = 1\).

Joint PDFs

The joint PDF of continuous RVs \(X\) and \(Y\) is: \begin {equation*} \begin {aligned} f_{X, Y}(a, b) &\ge 0 \end {aligned} \end {equation*} Their joint range is \begin {equation*} \begin {aligned} \Omega _{X, Y} &= \{(c, d): f_{X, Y}(c, d) > 0\}\subseteq \Omega _{X} \times \Omega _{Y} \end {aligned} \end {equation*} Note that \(\int _{-\infty }^\infty \int _{-\infty }^\infty f_{X, Y}(u, v) \dif u \dif v = 1\).

Multi-dimensional LOTUS (Discrete)

If \(g: \mathbb {R}^{2} \rightarrow \mathbb {R}\) is a function, then \begin {equation*} \begin {aligned} \expect {g(X, Y)} &= \sum _{x \in \Omega _{X}}\sum _{y \in \Omega _{Y}} g(x, y)\ p_{X, Y}(x, y) \end {aligned} \end {equation*}

Multi-dimensional LOTUS (Continuous)

If \(g: \mathbb {R}^{2} \rightarrow \mathbb {R}\) is a function, then \begin {equation*} \begin {aligned} \expect {g(X, Y)} &= \int _{-\infty }^\infty \int _{-\infty }^\infty g(s,t)\ f_{X, Y}(s, t)\ \dif s\dif t \end {aligned} \end {equation*}

Marginal PMFs

Let \(X, Y\) be discrete random variables. The marginal PMF of \(X\) is: \(p_{X}(a) = \sum _{b \in \Omega _{Y}}p_{X, Y}(a, b)\).

Independence of RVs (Discrete)

Discrete RVs \(X, Y\) are independent, written \(X \perp Y\), if for all \(x \in \Omega _{X}\) and \(y \in \Omega _{Y}\): \(p_{X, Y}(x, y) = p_{X}(x)p_{Y}(y)\).

Marginal PDFs

Let \(X, Y\) be continuous random variables. The marginal PDF of \(X\) is: \(f_X(x) = \int _{-\infty }^{\infty }f_{X, Y}(x,y)\dif y\).

Independence of RVs (Continuous)

Continuous RVs \(X, Y\) are independent, written \(X \perp Y\), if for all \(x \in \Omega _{X}\) and \(y \in \Omega _{Y}\), \(f_{X, Y}(x, y) = f_{X}(x)f_{Y}(y)\).

Continuous Probabilities

The joint PDF must satisfy the following (similar to univariate PDFs): \begin {equation*} \begin {aligned} \Prob {a \leq X < b, c \leq Y \leq d} &= \int _{a}^{b}\int _{c}^{d} f_{X, Y}(x, y) \dif y \dif x \end {aligned} \end {equation*}

Conditional Expectation

If \(X\) is discrete (and \(Y\) is either discrete or continuous), then we define the conditional expectation of \(g(X)\) given (the event that) \(Y=y\) as: \begin {equation*} \begin {aligned} \expect {\,g(X)\mid Y = y\,} &= \sum _{x \in \Omega _{X}} g(x)\ \mathbb {P}(X=x \mid Y=y) \end {aligned} \end {equation*} If \(X\) is continuous (and \(Y\) is either discrete or continuous), then \begin {equation*} \begin {aligned} \expect {\,g(X) \mid Y = y\,} &= \int _{-\infty }^{\infty }g(x)\ \frac {f_{X,Y}(x,y)}{f_Y(y)} \dif x \end {aligned} \end {equation*} Note that these look exactly like the formulas for “regular” (unconditional) expectation, except now we restrict ourselves to be given that \(Y=y\) (e.g., \(p_{X\mid Y}(x\mid y)\) instead of just \(p_X(x)\)). Notice that these sums and integrals are over \(x\) (not \(y\)), since \(\expect {g(X) \mid Y = y}\) is a function of \(y\). These formulas are exactly the same as \(\expect {g(X)}\), except the PMF/PDF of \(X\) is replaced with the conditional PMF/PDF of \(X \mid Y=y\).

Law of Total Expectation (LTE)

Let \(X, Y\) be jointly distributed random variables. If \(Y\) is discrete (and \(X\) is either discrete or continuous), then: \begin {equation*} \begin {aligned} \expect {\,g(X)\,} &= \sum _{y \in \Omega _{Y}}\expect {\,g(X) \mid Y = y\,}\ p_{Y}(y) \end {aligned} \end {equation*} If \(Y\) is continuous (and \(X\) is either discrete or continuous), then \begin {equation*} \begin {aligned} \expect {\,g(X)\,} &= \int _{-\infty }^{\infty }\expect {\,g(X) \mid Y = y\,}\ f_{Y}(y)\dif y \end {aligned} \end {equation*} Basically, for \(\expect {g(X)}\), we take a weighted average of \(\expect {g(X) \mid Y = y}\) over all possible values of \(y\).

Inequalities & Bounds

Markov’s Inequality

Let \(X \geq 0\) be a non-negative RV, and let \(k > 0\). Then: \(\Prob { X \geq k}\leq \dfrac {\expect {X}}{k}\).

Chebyshev’s Inequality

Let \(X\) be any RV with expected value \(\mu =\mathbb {E}[X]\) and finite variance \(\Var {X}\). Then, for any real number \(\alpha > 0\). Then, \(\Prob { \left | X - \mu \right | \geq \alpha } \leq \dfrac {\Var {X}}{\alpha ^2}\).

Chernoff Bound

Let \(X = X_1 + X_2 + \ldots + X_n\), where \(X_1, X_2, \ldots , X_n\) are independent random variables, each taking values in \([0,1]\). Also, let \(\mu = \mathbb {E}[X]\). For any \(\delta >0\): \begin {equation*} \Prob {\left |{X - \mu }\right | \geq \delta \mu }\leq \exp \left (-\frac {\delta ^2\mu }{4}\right ) \end {equation*}

The Union Bound

Let \(E_1,E_2,...,E_n\) be a collection of events. Then: \(\Prob {\bigcup _{i=1}^{n}{E_i}} \leq \sum _{i=1}^{n}{\Prob {E_i}}\).

Zoo of Random Variables

Bernoulli/Indicator Random Variable

\(X\sim \text {Bernoulli}(p)\) (\(\text {Ber}(p)\) for short) iff \(X\) has PMF: \begin {equation*} p_{X}\left ( k \right ) = \left \{ \begin {matrix} p,\ \ & k = 1 \\ 1 - p,\ \ & k = 0 \\ \end {matrix} \right .\ \end {equation*} \(\expect { X}= p\) and \(\Var {X} = p(1 - p)\). An example of a Bernoulli/indicator RV is one flip of a coin with \(\Prob { \text {head} } = p\).

Binomial Random Variable

\(X\sim \text {Binomial}(n,p)\) (\(\textsf {Bin}(n,p)\) for short) iff \(X\) has PMF \begin {equation*} p_{X}\left ( k \right ) = \binom {n}{k} p^{k}\left ( 1 - p \right )^{n - k},\ \ k \in \Omega _X=\{0,1,\ldots ,n\} \end {equation*} \(\expect {X}= np\) and \(\Var {X} = np(1 - p)\). \(X\) is the sum of \(n\) i.i.d. \(\text {Ber}(p)\) random variables.

Uniform Random Variable (Discrete)

\(X\sim \text {Uniform}(a,b)\) (\(\text {Unif}(a,b)\) for short), for integers \(a \leq b\), iff \(X\) has PMF: \begin {equation*} p_{X}\left ( k \right ) = \frac {1}{b - a + 1},\ \ k \in \Omega _X= \{a, a+1, \ldots ,b\} \end {equation*} \(\expect {X}= \frac {a + b}{2}\) and \(\Var {X} = \frac {\left ( b - a \right )\left ( b - a + 2 \right )}{12}\).

Geometric Random Variable

\(X\sim \text {Geometric}(p)\) (\(\text {Geo}(p)\) for short) iff \(X\) has PMF: \begin {equation*} p_{X}\left ( k \right ) = \left ( 1 - p \right )^{k - 1}p,\ \ k \in \Omega _X= \{1,2,3,\ldots \} \end {equation*} \(\expect {X} = \frac {1}{p}\) and \(\Var {X} = \frac {1 - p}{p^{2}}\).

Poisson Random Variable

\(X\sim \text {Poisson}(\lambda )\) (\(\text {Poi}(\lambda )\) for short) iff \(X\) has PMF: \begin {equation*} p_{X}\left ( k \right ) = e^{- \lambda }\frac {\lambda ^{k}}{k!},\ \ k\in \Omega _X = \{0,1,2,\ldots \} \end {equation*} \(\expect {X} = \lambda \) and \(\Var {X} = \lambda \). If \(X_{1},\ldots ,X_{n}\) are independent Poisson RV’s, where \(X_{i}\sim \text {Poi}(\lambda _{i})\), then \(X = X_{1} + \ldots + X_{n}\sim \text {Poi}(\lambda _{1} + \ldots + \lambda _{n})\).

Uniform Random Variable (Continuous)

\(X\sim \text {Uniform}(a,b)\) (\(\text {Unif}(a,b)\) for short) iff \(X\) has PDF: \begin {equation*} f_{X}\left ( x \right ) = \left \{ \begin {array}{ll} \frac {1}{b - a} & \mbox {if } x \in \Omega _X=\lbrack a,b\rbrack \\ 0 & \mbox {otherwise} \end {array} \right . \end {equation*} \(\expect {X} = \frac {a + b}{2}\) and \(\Var {X} = \frac {\left ( b - a \right )^{2}}{12}\).

Exponential Random Variable

\(X\sim \text {Exponential}(\lambda )\) (\(\text {Exp}(\lambda )\) for short) iff \(X\) has PDF: \begin {equation*} f_{X}\left ( x \right ) = \begin {cases} \lambda e^{- \lambda x} & \mbox {if } x \in \Omega _X=[0,\infty ) \\ 0 & \mbox {otherwise} \end {cases} \end {equation*} \(\expect {X}= \frac {1}{\lambda }\) and \(\Var {X} = \frac {1}{\lambda ^{2}}\). \(F_{X}\left ( x \right ) = 1 - e^{- \lambda x}\) for \(x \geq 0\). The exponential RV is also memoryless: \(\text {For any } s,t \geq 0, \ \Prob { X > s + t \mid X > s} = \Prob {X > t}\)

Normal (Gaussian, “bell curve”) Random Variable

\(X\sim \mathcal {N}(\mu ,\ \sigma ^{2})\) iff \(X\) has PDF: \begin {equation*} f_{X}\left ( x \right ) = \frac {1}{\sigma \sqrt {2\pi }}\,e^{- \frac {1}{2}\frac {\left ( x - \mu \right )^{2}}{\sigma ^{2}}},\ \ x \in \Omega _X= \bR \end {equation*} \(\expect {X}= \mu \) and \(\Var {X} = \sigma ^{2}\). The “standard normal” random variable is typically denoted \(Z\) and has mean \(0\) and variance \(1\). The CDF has no closed form, but we denote the CDF of the standard normal as \(\Phi \left ( z \right ) = F_{Z}\left ( z \right ) = \Prob {Z \leq z}\). It is symmetric and satisfies \(\Phi \left ( - z \right ) = 1 - \Phi (z)\).

Closure of the Normal Under Scale and Shift

If \(X \sim \mathcal {N}(\mu , \sigma ^2)\), then \(aX + b \sim \mathcal {N}(a\mu + b, a^2\sigma ^2)\). In particular, we can always scale/shift to get the standard Normal: \(\frac {X - \mu }{\sigma } \sim \mathcal {N}(0, 1)\).

Closure of the Normal Under Addition

If \(X \sim \mathcal {N}(\mu _X, \sigma _X^2)\) and \(Y \sim \mathcal {N}(\mu _Y, \sigma _Y^2)\) are independent, then \begin {equation*} aX + bY + c \sim \mathcal {N}(a\mu _X + b\mu _Y + c, a^2\sigma _X^2 + b^2\sigma _Y^2) \end {equation*}

Limit Theorems & Statistics

Sample Mean: Properties

Let \(X_{1}, X_{2}, \dots , X_{n}\) be a sequence of i.i.d. RVs with mean \(\mu \) and variance \(\sigma ^{2}\). The sample mean is: \(\overline {X}_{n} = \frac {1}{n}\sum _{i = 1}^{n}X_{i}\). Further, \(\expect {\overline {X}_{n}}=\mu \) and \(\Var {\overline {X}_{n}}=\sigma ^2/n\)

The Central Limit Theorem (CLT)

Let \(X_{1}, \dots X_{n}\) be a sequence of i.i.d. RVs with mean \(\mu \) and (finite) variance \(\sigma ^{2}\). Then as \(n \rightarrow \infty \), \begin {equation*} \overline {X}_{n} \rightarrow \mathcal {N}\left (\mu , \frac {\sigma ^{2}}{n}\right ) \end {equation*} The importance of the CLT is, regardless of the distribution of \(X_{i}\)’s, the sample mean approaches a Normal distribution as \(n \rightarrow \infty \).

The Continuity Correction

When approximating an integer-valued (discrete) random variable \(X\) with a continuous one \(Y\) (such as in the CLT), if asked to find the total probability for \(a\in S\) of \(\Prob {X=a}\), you should sum up \(\Prob {a - 0.5 \leq Y \leq a + 0.5}\) for \(a\in S\). For example if \(S=\{a,\ldots ,b\}\), use \(\Prob {a - 0.5 \leq Y \leq b + 0.5}\).

Likelihood & Log-Likelihood (Discrete)

Let \(x_1, ..., x_n\) be i.i.d. samples from PMF \(p_X(x ; \theta )\) where \(\theta \) is a parameter (or vector of parameters). The likelihood of \(\theta \) given the samples is \begin {equation*} \mathcal {L}(x_1, ..., x_n; \theta ) = \prod _{i=1}^{n}p_X(x_i ; \theta ). \end {equation*} The log-likelihood is \(\ln \mathcal {L}(x_1, ..., x_n; \theta ) = \sum _{i=1}^{n}\ln p_X(x_i ; \theta )\).

Likelihood & Log-Likelihood (Continuous)

Let \(x_1, ..., x_n\) be i.i.d. samples from PDF \(f_X(x ; \theta )\) where \(\theta \) is a parameter (or vector of parameters). The likelihood of \(\theta \) given the samples is \begin {equation*} \mathcal {L}(x_1, ..., x_n; \theta ) = \prod _{i=1}^{n}f_X(x_i ; \theta ). \end {equation*} The log-likelihood is \(\ln \mathcal {L}(x_1, ..., x_n; \theta ) = \sum _{i=1}^{n}\ln f_X(x_i ; \theta )\)

Maximum Likelihood Estimator (MLE)

Let \(x_1, ..., x_n\) be i.i.d. samples from probability distribution with parameter (or vectors of parameters) \(\theta \). The maximum likelihood estimator (MLE) \(\hat {\theta }_{MLE}\) for \(\theta \) is the value of \(\theta \) that maximizes the likelihood/log-likelihood: \(\hat {\theta }_{MLE} = \arg \max _{\theta }\mathcal {L}(x_1, ..., x_n; \theta ) = \arg \max _{\theta }\ln \mathcal {L}(x_1, ..., x_n; \theta )\). It can often be found by setting \(\frac \partial {\partial \theta } \ln \mathcal {L}(x_1, ..., x_n; \theta )=0\).

Unbiased Estimators

An estimator \(\hat \theta =\hat \theta (x_1,\ldots ,x_n)\) of a parameter \(\theta \) is unbiased if \(\expect [\,\hat \theta (X_1,\ldots ,X_n)\,] = \theta \).

Markov Chains

DTSP

A discrete-time stochastic process (DTSP) is a sequence of random variables \(X^{(0)},X^{(1)},X^{(2)},...\), where \(X^{(t)}\) is the value at time \(t\).

Markov Chain

A Markov Chain on \(n\) states \(\{1, \ldots , n\}\) is a DTSP where for all \(i,j\), \(\Prob {X^{(t+1)} = j \mid X^{(t)} = i}\) is the same for all \(t \geq 0\). The transition probability matrix (TPM) \(\symbf {M}\) is such that the entry at the \(i\)-th row and \(j\)-th column is \(\symbf {M}_{ij} = \Prob {X^{(t+1)} = j \mid X^{(t)} = i}\).

Linear Algebra

We can view the distribution of \(X^{(t)}\) in a Markov Chain with \(n\) states as an \(n\)-dimensional row-vector \(\symbf {q}^{(t)}\). If the TPM is \(\symbf {M}\), we have \(\symbf {q}^{(t)} = \symbf {q}^{(t-1)}\symbf {M} = \symbf {q}^{(0)}\symbf {M}^t\).

Stationary distribution

A stationary distribution \(\symbf {\pi } = [\pi _1, \ldots , \pi _n]\) of an \(n\)-state Markov chain with TPM \(\symbf {M}\) is such that \(\symbf {\pi } \cdot \symbf {M} = \symbf {\pi }\) and \(\sum _{i=1}^n \pi _i = 1\).