CSE 312 – Section 4 Solutions

Spring 2026

Review of Main Concepts

Random Variable (rv): A numeric function $X:\Omega \to \mathbb {R}$ of the outcome.
Range/Support: The support/range of a random variable $X$, denoted $\Omega _{X}$, is the set of all possible values that $X$ can take on.
Discrete Random Variable (drv): A random variable taking on a countable (either finite or countably infinite) number of possible values.
Probability Mass Function (pmf) for a discrete random variable $X$: a function $p_{X}:\Omega _{X} \rightarrow \lbrack 0,1\rbrack $ with $p_{X}\left ( x \right ) = \Pr (X = x)$ that maps possible values of a discrete random variable to the probability of that value happening, such that $\sum _{x}^{}{p_{X}(x)} = 1$.
Cumulative Distribution Function (CDF) for a random variable $X$: a function $F_{X}:R \rightarrow R$ with $F_{X}\left ( x \right ) = \Pr (X \le x)$
Expectation (expected value or mean): The expectation of a discrete random variable is defined to be $\Exp [X ] = \sum _{x}^{}{xp_{X}(x)} = \sum _{x}{x\Pr (X = x)}$.
Law of the unconscious statistician (LOTUS): The expectation of a function of a discrete random variable $g(X)$ is $\Exp [ g(X)] = \sum _{x}{g(x)p_{X}(x)}$.
Linearity of Expectation (LoE): Let $X$ and $Y$ be random variables, and $a,b,c\mathbb {\in R}$. Then, $\Exp [aX + bY + c] = a \Exp [X] + b\Exp [Y]+c$. Also, for any random variables $X_1, \ldots , X_n$, \[\Exp [X_1 + X_2 + \ldots + X_n] =\Exp [X_1] + \Exp [X_2] + \ldots + \Exp [X_n].\]
Variance: Let $X$ be a random variable and $\mu = \Exp [X]$. The variance of $X$ is defined to be $\Var (X)=\Exp [(X-\mu )^2] $. Notice that since this is an expectation of a non-negative random variable ($\left ( X - \mu \right )^{2}$), variance is always non-negative. With some algebra, we can simplify this to $\Var ( X) = \Exp [ X^{2}] - \Exp [X]^2$.
Standard Deviation: Let $X$ be a random variable. We define the standard deviation of $X$ to be the square root of the variance, and denote it $\sigma =\sqrt {\Var (X)}$.
Property of Variance: Let $a,b \ \mathbb {\in R}$ and let $X$ be a random variable. Then, $\Var (aX+b)=a^2\Var (X)$.
Independence: Random variables $X$ and $Y$ are independent iff \[\forall x \forall y,\quad \Pr ( X = x \cap Y = y) = \Pr ( X = x)\Pr (Y = y)\] In this case, we have $\Exp [XY]= \Exp [X]\Exp [Y]$ (the converse is not necessarily true).
i.i.d. (independent and identically distributed): Random variables $X_{1},\ldots ,X_{n}$ are i.i.d. (or iid) iff they are independent and have the same probability mass function.
Variance of Independent Variables: If $X$ is independent of $Y$, $\Var \left ( X + Y \right ) = \Var \left ( X \right ) + \Var (Y)$. This depends on independence, whereas linearity of expectation always holds. Note that this combined with the above shows that $\forall a,b,c\ \mathbb {\in R}$ and if $X$ is independent of $Y$, $\Var (aX + bY + c) =a^{2}\Var (X) + b^{2}\Var (Y)$.

Announcements & Plan for Section

Announcements

PSet 2 grades were released and can be viewed on Gradescope. Regrade requests will close on 4/25. We highly recommend taking a look through feedback on the psets, the common errors on Ed and the solutions that were posted (link on Ed).
PSet 3 was due earlier this week.
PSet 4 is released — start early.
This week’s focus: Linearity of Expectation, LOTUS, and Variance, leading into discrete random variables.

Plan for Section

Content Review (Problem 1)
Quick check of core definitions: PMF vs CDF, expectation, and variance.
Problem 3 (3-sided die)
Build intuition for PMFs and practice computing expectation both directly and using linearity.
Problem 8 (Hungry washing machine)
Key application of indicator variables and linearity of expectation — avoiding messy counting.
Problem 11 (Hat check)
Classic example showing linearity of expectation works even without independence.
Problem 5 (if time)
Reinforces thinking about ranges/support of random variables.
Problem 6 (if time)
Additional practice connecting PMFs, expectations, and variance.

1 Content Review

a)

True or false: the range of a random variable $X$ is the set of probabilities corresponding to the possible values $X$ can take on.

False. The range (or support) of a random variable is the set of all possible values it can take on, not the probabilities of those values.

b)

What is the relationship between standard deviation and variance of a random variable $X$?

$\sigma = (\Var (X))^2$
$\sigma = \Var (X^2)$
$\Var (X) = \sigma ^2$

$\Var (X) = \sigma ^2$ (or equivalently, $\sigma = \sqrt {\Var (X)}$).

c)

Let $X$ be the random variable representing the the sum of a 3-dice roll of 6-sided dice. Which function would you use to determine the probability that $X=7$?

CDF (cumulative distribution function)
PMF (probability mass function)

PMF. We use the PMF when we want to find the probability of a specific, exact value of a discrete random variable occurring.

d)

Let $X$ be the random variable representing the outcome of taking the sum of a 3-dice roll of 6-sided dice. Which function would you use to determine the probability that $X \leq 7$?

CDF (cumulative distribution function)
PMF (probability mass function)

CDF. The CDF directly gives us the cumulative probability $\Pr (X \leq x)$.

e)

A random variable $X$ has the PMF \[p_X(x) = \begin {cases} 1/4 & x=-1 \\ 1/4 & x=0 \\ 1/2 & x=2 \\ 0 & \text {otherwise} \end {cases}\] What is $\Exp [X]$?

-1/4
3/4
1
2

3/4. \[\Exp [X] = \sum _{x \in \Omega _X} xp_X(x) = (-1)\left (\frac {1}{4}\right ) + (0)\left (\frac {1}{4}\right ) + (2)\left (\frac {1}{2}\right ) = -\frac {1}{4} + 0 + 1 = {\frac {3}{4}}\]

f)

A random variable $X$ has the PMF \[p_X(x) = \begin {cases} 1/4 & x=-1 \\ 1/4 & x=0 \\ 1/2 & x=2 \\ 0 & \text {otherwise} \end {cases}\] What is $\Var (X)$?

$3/4$
$1$
$((1/4)+2)-((\frac {3}{4})^2) = 27/16$
$((1/4)+2)+((\frac {3}{4})^2) = 45/16$

$27/16$. \[ \begin {aligned} \Var (X) &= \Exp [X^2] - \Exp [X]^2 \\ &= \left (\sum _{x \in \Omega _X} x^2 p_X(x)\right ) - \left (\frac {3}{4}\right )^2 \\ &= \left ((-1)^2 \cdot \frac {1}{4} + (0)^2 \cdot \frac {1}{4} + (2)^2 \cdot \frac {1}{2}\right ) - \frac {9}{16} \\ &= \left (\frac {1}{4} + 0 + 2\right ) - \frac {9}{16} \\ &= \frac {9}{4} - \frac {9}{16} = {\frac {27}{16}} \end {aligned} \]

g)

A random variable $X$ has the CDF \[ F_X(x) = \begin {cases} 0 & x < 1 \\ \frac {1}{5} & 1 \le x < 2 \\ \frac {1}{2} & 2 \le x < 3 \\ \frac {3}{5} & 3 \le x < 4 \\ 1 & x \ge 4 \end {cases} \] What is $p_X(2)$?

$3/10$
$1/2$
$7/10$
$1/5$

$3/10$. \[ \begin {aligned} p_X(2) &= \mathbb {P}_X(X \leq 2) - \mathbb {P}_X(X \leq 1) \\ &= F_X(2) - F_X(1) \\ &= \frac {1}{2} - \frac {1}{5} = \frac {3}{10} \\ \end {aligned} \]

2 Identify that Range!

Identify the support/range $\Omega _X$ of the random variable $X$, if $X$ is...

a): The sum of two rolls of a six-sided die.

$X$ takes on every integer value between the minimum sum ($1+1=2$) and the maximum sum ($6+6=12$).
$\Omega _X=\{2,3,\dots ,12\}$
b): The number of lottery tickets I buy until I win it (assuming I buy at least one lottery ticket).

$X$ takes on all positive integer values. (It is possible, though increasingly unlikely, that I never win the lottery, so there is no upper bound).
$\Omega _X=\{1,2,3,\dots \}$
c): The number of heads in $n$ flips of a coin with $0<\Pr (\text {head})<1$.

$X$ takes on every integer value between the minimum number of heads ($0$), and the maximum possible ($n$).
$\Omega _X=\{0,1,\dots ,n\}$
d): The number of heads in $n$ flips of a coin with $\Pr (\text {head})=1$.

Since $\Pr (\text {head}) = 1$, we are mathematically guaranteed to get exactly $n$ heads in $n$ flips.
$\Omega _X=\{n\}$
e): The number of whole minutes I wait at the bus stop for the next bus.

The number of whole minutes is discrete and will take on values between the minimum waiting time ($0$, the bus is already here), and the maximum theoretically unbounded waiting time.
$\Omega _X=\{0,1,2,\dots \}$

3 3-sided Die

Let the random variable $X$ be the sum of two independent rolls of a fair 3-sided die. (If you are having trouble imagining what that looks like, you can use a 6-sided die and change the numbers on 3 of its faces.)

a)

What is the probability mass function of $X$?

First let us define the range of $X$. A three-sided die can take on values $\{1,2,3\}$. Since $X$ is the sum of two rolls, the minimum is 2 and the maximum is 6. Thus, $\Omega _{X} = \{2, 3, 4, 5, 6\}$.

We can then define the PMF of $X$. To that end, we define two random variables $R_1, R_2$ with $R_1$ being the roll of the first die, and $R_2$ being the roll of the second die. Then, $X = R_1 + R_2$. Note that $\Omega _{R1} = \Omega _{R2} = \{1,2,3\}$ and $\Pr (R_i = x) = 1/3$ for all valid $x$.

We can compute the probability of each sum by adding up the probabilities of the specific roll combinations $(R_1, R_2)$ that yield that sum:

$X=2$: (1,1) $\rightarrow \frac {1}{3} \cdot \frac {1}{3} = \frac {1}{9}$
$X=3$: (1,2), (2,1) $\rightarrow \frac {2}{9}$
$X=4$: (1,3), (2,2), (3,1) $\rightarrow \frac {3}{9}$
$X=5$: (2,3), (3,2) $\rightarrow \frac {2}{9}$
$X=6$: (3,3) $\rightarrow \frac {1}{9}$

Summarizing this in a piece-wise PMF: \[p_X(k)= \begin {cases} 1/9 & k=2 \\ 2/9 & k=3 \\ 3/9 & k=4 \\ 2/9 & k=5 \\ 1/9 & k=6\\ 0 & \text {otherwise} \end {cases} \]

b)

Find $\Exp [X]$ directly from the definition of expectation.

\[\Exp [X] = \sum _{k=2}^6{k \cdot p_X(k)} = 2\left (\frac {1}{9}\right ) + 3\left (\frac {2}{9}\right ) + 4\left (\frac {3}{9}\right ) + 5\left (\frac {2}{9}\right ) + 6\left (\frac {1}{9}\right ) \] \[=\frac {2+6+12+10+6}{9} = \frac {36}{9} = \boxed {4}\]

c)

Find $\Exp [X]$ again, but this time using Linearity of Expectation.

Let $R_1$ be the roll of the first die, and $R_2$ the roll of the second. Then, $X=R_1+R_2$.

By Linearity of Expectation, we get: \[\Exp [X] = \Exp [R_1+R_2] = \Exp [R_1] + \Exp [R_2]\]

We first compute the expected value of a single roll: \[\Exp [R_1] = \sum _{i=1}^3{i \cdot \Pr (R_1 = i)} = 1\left (\frac {1}{3}\right ) + 2\left (\frac {1}{3}\right ) + 3\left (\frac {1}{3}\right ) = \frac {6}{3} = 2\] Since the dice are identical, $\Exp [R_2] = 2$ as well.

Plugging this into our expression for $X$: \[\Exp [X] = 2 + 2 = \boxed {4}\]

d)

What is $\Var (X)$?

We know from the definition of variance that: \[\Var (X) = \Exp [X^2] - \Exp [X]^2\]

We computed $\Exp [X] = 4$ in the previous parts. Now we compute $\Exp [X^2]$ using the PMF: \[\Exp [X^2] = \sum ^6_{x = 2} x^2p_X(x) = \frac {2^2(1) + 3^2(2) + 4^2(3) + 5^2(2) + 6^2(1)}{9} = \frac {4 + 18 + 48 + 50 + 36}{9}\] \[= \frac {156}{9} = \frac {52}{3}\]

Plugging this into our variance equation gives us: \[\Var (X) = \frac {52}{3} - (4)^2 = \frac {52}{3} - \frac {48}{3} = \boxed {\frac {4}{3}}\]

4 Kit Kats Again

Suppose we have $N$ candies in a jar, $K$ of which are Kit-Kats. Suppose we draw (without replacement) until we have (exactly) $k$ Kit-Kats, where $k \leq K \leq N$. Let $X$ be the number of draws until the $k^{\text {th}}$ Kit-Kat. What is $\Omega _{X}$, the range of $X$? What is $p_{X}\left ( n \right ) = \Pr (X = n)$? ($X$ is called a “negative hypergeometric” random variable).

Since it takes a minimum of $k$ draws to get $k$ Kit-Kats, the lowest possible value is $k$. The absolute maximum number of draws occurs if we draw every single non-Kit-Kat candy in the jar ($N-K$ of them) before we finally get our $k$-th Kit-Kat. Therefore, the maximum number of draws is $(N-K) + k$. \[\Omega _{X} = \{ k, k + 1, \ldots , N - K + k\}\]

There are two completely valid ways to find the PMF.

Method 1: Pure Combinatorics
Imagine drawing all $N$ candies one after the other without replacement. There are $\binom {N}{K}$ equally likely ways to position the $K$ Kit-Kats among the $N$ total draws. For the $k$-th Kit-Kat to occur exactly on the $n$-th draw, two independent conditions must be met:

a): We must position exactly $k-1$ Kit-Kats somewhere among the first $n-1$ draws: $\binom {n-1}{k-1}$ ways.
b): There is a Kit-Kat at the $n$-th draw.
c): We must position the remaining $K-k$ Kit-Kats somewhere among the final $N-n$ draws: $\binom {N-n}{K-k}$ ways.

By the product rule, we multiply these independent choices and divide by the total sample space: \[ p_X(n) = \Pr (X=n) = \frac {\binom {n-1}{k-1} \binom {N-n}{K-k}}{\binom {N}{K}} \]

Method 2: Conditional Probability
Let $B$ be the event: "We draw exactly $k$ Kit-Kats in our first $n$ draws." Then \[ \Pr (B) = \frac {\binom {K}{k}\binom {N-K}{n-k}}{\binom {N}{n}} \] Let $A$ be the event: "The $n$-th draw is a Kit-Kat." We want to find $\Pr (A \cap B)$, which means we drew exactly $k$ Kit-Kats in our first $n$ draws AND the very last of those $n$ draws was a Kit-Kat (making it the $k$-th one).

Conditioned on event $B$, we know we have a set of $n$ drawn candies where exactly $k$ of them are Kit-Kats. Because the order in which those $n$ candies were drawn is uniformly random, the probability that the last candy drawn was one of the Kit-Kats is simply $\frac {k}{n}$. Therefore, $\Pr (A \mid B) = \frac {k}{n}$.

Applying the Chain Rule ($\Pr (A \cap B) = \Pr (B)\Pr (A \mid B)$) gives: \[ p_X(n) = \Pr (X=n) = \frac {\binom {K}{k}\binom {N-K}{n-k}}{\binom {N}{n}} \cdot \frac {k}{n} \]

*(Note: While these expressions look entirely different, with a bit of algebra, you can verify that they are equal!)*

5 Hungry Washing Machine

You have 10 pairs of socks (so 20 socks in total), with each pair being a different color. You put them in the washing machine, but the washing machine eats 4 of the socks chosen at random. Every subset of 4 socks is equally probable to be the subset that gets eaten. Let $X$ be the number of complete pairs of socks that you have left.

a)

What is the range of $X$, $\Omega _X$ (the set of possible values it can take on)? What is the probability mass function of $X$?

The washing machine eats exactly $4$ socks. The outcomes can be categorized by how those 4 eaten socks were distributed among the pairs:

The 4 socks belong to 4 completely different pairs. This ruins 4 pairs, leaving $10 - 4 = 6$ complete pairs.
The 4 socks belong to 1 complete pair and 2 other single socks. This ruins 3 pairs, leaving $10 - 3 = 7$ complete pairs.
The 4 socks belong to exactly 2 complete pairs. This ruins 2 pairs, leaving $10 - 2 = 8$ complete pairs.

Thus, the range is \[\Omega _X=\{6, 7, 8\}\]

Because every subset of 4 socks is equally likely, our sample space size is $|\Omega | = \binom {20}{4} = 4845$. We can compute $\Pr (X=k) = \frac {|X=k|}{|\Omega |}$ for each $k$:

For $k = 6$: We must pick $4$ out of the $10$ pairs of socks from which the machine will eat a single sock ($\binom {10}{4}$ ways). For each of these $4$ pairs, the machine has $2$ specific socks to pick from ($\binom {2}{1}^4 = 2^4$ ways). $|X = 6| = \binom {10}{4}2^4 = 210 \cdot 16 = 3360$.

For $k = 7$: We first pick $1$ out of the $10$ pairs to be eaten in its entirety ($\binom {10}{1}$ ways). We then pick $2$ out of the remaining $9$ pairs from which it will eat a single sock ($\binom {9}{2}$ ways). For each of these $2$ pairs, there are $2$ specific socks to pick from ($\binom {2}{1}^2 = 2^2$ ways). $|X = 7| = \binom {10}{1}\binom {9}{2}2^2 = 10 \cdot 36 \cdot 4 = 1440$.

For $k = 8$: We simply pick $2$ out of the $10$ pairs of socks to be eaten entirely ($\binom {10}{2}$ ways). $|X = 8| = \binom {10}{2} = 45$.

*(Self-check: $3360 + 1440 + 45 = 4845$, which perfectly matches $|\Omega |$.)*

Summarizing as a PMF: \[p_X(k)= \begin {cases} \frac {3360}{4845} & k=6 \\[0.5em] \frac {1440}{4845} & k=7 \\[0.5em] \frac {45}{4845} & k=8 \\[0.5em] 0 & \text {otherwise} \end {cases} \]

b)

Find $\Exp [X]$ from the definition of expectation.

\[\Exp [X]= \sum _{k \in \Omega _X}{k \cdot p_X(k)} = 6\left (\frac {3360}{4845}\right ) + 7\left (\frac {1440}{4845}\right ) + 8\left (\frac {45}{4845}\right ) = \frac {20160 + 10080 + 360}{4845}\] \[= \frac {30600}{4845} = \boxed {\frac {120}{19}}\]

c)

Find $\Exp [X]$ using Linearity of Expectation. (see this video for a walkthrough of this problem!)

Step 1: Define Indicator Random Variables
For $i \in \{1, \dots , 10\}$, let $X_i$ be an indicator random variable such that $X_i = 1$ if pair $i$ survived completely intact, and $0$ otherwise. We can express our total intact pairs as $X = \sum _{i=1}^{10}{X_i}$.

Step 2: Find Expectation of the Indicator
The expected value of an indicator variable is simply the probability the event occurs: $\Exp [X_i] = \Pr (X_i = 1)$. For pair $i$ to survive intact, the washing machine must have chosen its 4 eaten socks exclusively from the other 18 available socks. \[\Exp [X_i] = \Pr (X_i=1) = \frac {\binom {18}{4}}{\binom {20}{4}} = \frac {3060}{4845} = \frac {12}{19}\]

Step 3: Apply Linearity of Expectation
By Linearity of Expectation, the expectation of the sum is the sum of the expectations: \[\Exp [X] = \Exp \left [\sum _{i=1}^{10}{X_i}\right ] = \sum _{i=1}^{10}{\Exp [X_i]} = 10 \cdot \left (\frac {12}{19}\right ) = \boxed {\frac {120}{19}}\]

d)

Which way was easier? Doing both (a) and (b), or just (c)?

Part (c) was vastly easier and less prone to algebraic error. In this specific problem, computing the PMF (parts a and b) was manageable because there were only 3 possible values in the range. However, if you had 50 pairs of socks and the machine ate 20 of them, the PMF would have dozens of complex cases, making part (b) a nightmare. Linearity of Expectation (c) scales effortlessly regardless of the numbers involved!

6 Hat Check

At a reception, $n$ people give their hats to a hat-check person. When they leave, the hat-check person gives each of them a hat chosen at random from the hats that remain. What is the expected number of people who get their own hats back? (Notice that the hats returned to two people are not independent events: if a certain hat is returned to one person, it cannot also be returned to the other person.)

Note: See this video for a walkthrough of this problem!

Let $X$ be the total number of people who get their own hats back.

Step 1: Define Indicator Random Variables
For $i\in \{1, \dots , n\}$, let $X_i$ be an indicator variable where $X_i = 1$ if person $i$ gets their own hat back, and $0$ otherwise. Thus, $X = \sum ^{n}_{i=1} X_i$.

Step 2: Find Expectation of the Indicator
The expected value of an indicator is $\Exp [X_i] = \Pr (X_i = 1)$. The sample space $|\Omega |$ is all $n!$ possible ways to distribute the $n$ hats among the $n$ people. The event we care about is person $i$ receiving hat $i$. If we lock person $i$ with hat $i$, there are $(n-1)$ remaining people and $(n-1)$ remaining hats, which can be distributed in $(n-1)!$ ways. \[\Exp [X_i] = \Pr (X_i = 1) = \frac {(n-1)!}{n!} = \frac {1}{n}\]

Step 3: Apply Linearity of Expectation
Notice that even though the random variables $X_i$ are heavily dependent on each other, Linearity of Expectation holds regardless of independence! \[\Exp [X] = \Exp \left [\sum _{i=1}^n{X_i}\right ] = \sum _{i=1}^n{\Exp [X_i]} = \sum _{i=1}^n{\frac {1}{n}} = n \cdot \frac {1}{n} = \boxed {1}\]

On average, exactly 1 person will get their own hat back, whether there are 10 people at the party or 10,000 people!

7 Frogger

A frog starts on a 1-dimensional number line at $0$. At each second, independently, the frog takes a unit step right with probability $p_{1}$, to the left with probability $p_{2}$, and doesn’t move with probability $p_{3}$, where $p_{1} + p_{2} + p_{3} = 1$. After 2 seconds, let $X$ be the location of the frog.

a)

Find $p_{X}(k)$, the probability mass function for $X$.

Let an outcome be defined by the sequence of the two steps. For example, $LN$ means a Left step followed by No step. Because the frog takes at most 2 steps, the range of $X$ is $\{-2,-1,0,1,2\}$. We calculate the probability of each final position by adding the probabilities of the step sequences that land there:

$X = -2$: Must step Left then Left ($LL$). $\Pr (LL) = p_2 \cdot p_2 = p_2^2$
$X = -1$: Can be ($LN$) or ($NL$). $\Pr (LN \cup NL) = p_2 p_3 + p_3 p_2 = 2p_2p_3$
$X = 0$: Can be ($NN$), ($LR$), or ($RL$). $\Pr (NN \cup LR \cup RL) = p_3^2 + p_2 p_1 + p_1 p_2 = p_3^2 + 2p_1p_2$
$X = 1$: Can be ($RN$) or ($NR$). $\Pr (RN \cup NR) = p_1 p_3 + p_3 p_1 = 2p_1p_3$
$X = 2$: Must step Right then Right ($RR$). $\Pr (RR) = p_1 \cdot p_1 = p_1^2$

Summarizing as a PMF: \[p_X(k)= \begin {cases} p_2^2 & k=-2 \\ 2p_2p_3 & k=-1 \\ p_3^2+2p_1p_2 & k=0 \\ 2p_1p_3 & k=1 \\ p_1^2 & k=2\\ 0 & \text {otherwise} \end {cases} \]

b)

Compute $\Exp [X]$ from the definition.

\[ \begin {aligned} \Exp [X] &= \sum _{k=-2}^2 k \cdot p_X(k) \\ &= (-2)(p_2^2) + (-1)(2p_2p_3) + (0)(p_3^2+2p_1p_2) + (1)(2p_1p_3) + (2)(p_1^2) \\ &= 2p_1^2 + 2p_1p_3 - 2p_2^2 - 2p_2p_3 \\ &= 2p_1(p_1 + p_3) - 2p_2(p_2 + p_3) \end {aligned} \] Since $p_1 + p_2 + p_3 = 1$, we know $(p_1 + p_3) = 1 - p_2$ and $(p_2 + p_3) = 1 - p_1$. Substituting these in: \[ \begin {aligned} &= 2p_1(1 - p_2) - 2p_2(1 - p_1) \\ &= 2p_1 - 2p_1p_2 - 2p_2 + 2p_1p_2 \\ &= {2(p_1 - p_2)} \end {aligned} \]

c)

Compute $\Exp [X]$ again, but using Linearity of Expectation.

Let $Y_1$ be the amount you moved on the first step (taking values $-1, 0, 1$), and $Y_2$ the amount you moved on the second step. Then $X = Y_1 + Y_2$.

The expected value of a single step is: \[\Exp [Y_1] = (-1)(p_2) + (0)(p_3) + (1)(p_1) = p_1 - p_2\] Because the steps are identical, $\Exp [Y_2] = p_1 - p_2$.

By Linearity of Expectation: \[\Exp [X] = \Exp [Y_1 + Y_2] = \Exp [Y_1] + \Exp [Y_2] = (p_1 - p_2) + (p_1 - p_2) = {2(p_1 - p_2)}\]

8 Balls in Bins

Let $X$ be the number of bins that remain empty when $m$ balls are distributed into $n$ bins randomly and independently. For each ball, each bin has an equal probability of being chosen. (Notice that two bins being empty are not independent events: if one bin is empty, that decreases the probability that the second bin will also be empty. This is particularly obvious when $n=2$ and $m>0$.) Find $\Exp [X]$.

Step 1: Define Indicator Random Variables
For $i\in \{1, \dots , n\}$, let $X_i$ be an indicator variable where $X_i = 1$ if bin $i$ is empty, and $0$ otherwise. The total number of empty bins is $X = \sum _{i=1}^n{X_i}$.

Step 2: Find Expectation of the Indicator
$\Exp [X_i] = \Pr (X_i = 1)$. For bin $i$ to be empty, every single one of the $m$ balls must go into a bin other than bin $i$. The probability that a specific ball goes into a different bin is $(n-1)/n$. Since the balls are distributed independently, we multiply this probability $m$ times: \[\Exp [X_i] = \Pr (X_i=1) = \left (\frac {n-1}{n}\right )^m\]

Step 3: Apply Linearity of Expectation
Despite the indicator variables being dependent on one another, LoE still holds: \[\Exp [X] = \Exp \left [\sum _{i=1}^n{X_i}\right ] = \sum _{i=1}^n{\Exp [X_i]} = {n \cdot \left (\frac {n-1}{n}\right )^m}\]

9 Fair Game?

You flip a fair coin independently and count the number of flips until the first tail, including that tail flip in the count. If the count is $n$, you receive $2^n$ dollars. What is the expected amount you will receive? How much would you be willing to pay at the start to play this game?

The expected reward is $\infty $.

Let $N$ be the number of flips until the first tail. The probability of getting exactly $n-1$ heads followed by 1 tail is: $p_N(n) = \left (\frac {1}{2}\right )^{n-1} \cdot \frac {1}{2} = \frac {1}{2^n}$ for $n \in \{1, 2, \dots \}$.

We want to find the expected value of our reward, which is a function of $N$, specifically $g(N) = 2^N$. Using the Law of the Unconscious Statistician (LOTUS): \[\Exp [2^N] = \sum _{n=1}^\infty {2^n \cdot p_N(n)} = \sum _{n=1}^\infty {2^n \cdot \frac {1}{2^n}} = \sum _{n=1}^\infty {1} = \infty \]

In theory, a fully rational actor aiming to maximize expected value should be willing to pay any finite amount of money to play this game (this is the famous St. Petersburg Paradox). However, in reality, you would likely be nervous to pay a lot. For instance, if you pay $1000 to play, you will lose money unless the first 9 flips are all heads (which has a probability of less than 0.2%). With high probability you will lose your buy-in, and with an incredibly low probability you will win a massive amount of money.

10 Symmetric Difference

Suppose $A$ and $B$ are random, independent (possibly empty) subsets of $\{1,2,\ldots ,n\}$, where each subset is equally likely to be chosen as $A$ or $B$. Consider $A \Delta B=(A\cap B^C)\cup (B\cap A^C)=(A\cup B)\cap (A^C\cup B^C)$, i.e., the set containing elements that are in exactly one of $A$ and $B$. Let $X$ be the random variable that is the size of $A \Delta B$. What is $\Exp [X]$?

Step 1: Define Indicator Random Variables
For $i \in \{1,2,\ldots ,n\}$, let $X_i$ be the indicator variable where $X_i = 1$ if element $i \in A \Delta B$, and $0$ otherwise. The total size of the set is $X=\sum _{i=1}^n{X_i}$.

Step 2: Find Expectation of the Indicator
$\Exp [X_i] = \Pr (X_i = 1)$. Because subsets are chosen uniformly at random, any element $i$ has a $1/2$ probability of being in $A$, and independently, a $1/2$ probability of being in $B$. Element $i$ is in $A \Delta B$ if it is in $A$ but not $B$, OR in $B$ but not $A$. \[\Pr (X_i = 1) = \Pr (i \in A)\Pr (i \notin B) + \Pr (i \notin A)\Pr (i \in B)\] \[= \left (\frac {1}{2}\right )\left (\frac {1}{2}\right ) + \left (\frac {1}{2}\right )\left (\frac {1}{2}\right ) = \frac {1}{4} + \frac {1}{4} = \frac {1}{2}\]

Step 3: Apply Linearity of Expectation
\[\Exp [X] = \Exp \left [\sum _{i=1}^n{X_i}\right ] = \sum _{i=1}^n{\Exp [X_i]} = \sum _{i=1}^n{\frac {1}{2}} = {\frac {n}{2}}\]

11 Practice

a): Let $X$ be a random variable with $p_X(k)=ck$ for $k\in \{1,\dots ,5\}=\Omega _X$, and $0$ otherwise. Find the value of $c$ that makes $X$ follow a valid probability distribution and compute its mean and variance ($\Exp [X]$ and $\Var (X)$).
b): Let $X$ be any random variable with mean $\Exp [X]=\mu $ and variance $\Var (X)=\sigma ^2$. Find the mean and variance of $Z=\dfrac {X-\mu }{\sigma }$. (When you’re done, you’ll see why we call this a “standardized” version of $X$!)
c): Let $X,Y$ be independent random variables. Find the mean and variance of $X-3Y-5$ in terms of $\Exp [X],\Exp [Y],\Var (X)$, and $\Var (Y)$.
d): Let $X_1,\dots ,X_n$ be independent and identically distributed (iid) random variables each with mean $\mu $ and variance $\sigma ^2$. The sample mean is $\bar {X}=\frac {1}{n}\sum _{i=1}^n{X_i}$. Find the mean and variance of $\bar {X}$. If you use the independence assumption anywhere, explicitly label at which step(s) it is necessary for your equalities to be true.

a): For $X$ to follow a valid probability distribution, its PMF must sum to 1: $\sum _{k \in \Omega _X}{p_X(k)} = 1$. \[\sum _{k = 1}^{5}{ck} = c\sum _{k = 1}^{5}{k} = c \cdot (1+2+3+4+5) = 15c = 1\] Therefore, $c=1/15$.
We can now use the definition of expectation: \[\Exp [X] = \sum _{k=1}^5 k \left (\frac {k}{15}\right ) = \frac {1^2+2^2+3^2+4^2+5^2}{15} = \frac {1+4+9+16+25}{15} = \frac {55}{15} = {\frac {11}{3}} \approx 3.667\] And compute $\Exp [X^2]$: \[\Exp [X^2] = \sum _{k=1}^5 k^2 \left (\frac {k}{15}\right ) = \frac {1^3+2^3+3^3+4^3+5^3}{15} = \frac {1+8+27+64+125}{15} = \frac {225}{15} = {15}\] And finally the variance of $X$: \[\Var (X) = \Exp [X^2] - \Exp [X]^2 = 15 - \left (\frac {11}{3}\right )^2 = \frac {135}{9} - \frac {121}{9} = {\frac {14}{9}} \approx 1.556\]
b): By Linearity of Expectation, we know that $\Exp [aX + b] = a\Exp [X] + b$. \[\Exp [Z] = \Exp \left [\dfrac {X-\mu }{\sigma }\right ] = \Exp \left [\frac {1}{\sigma }X - \frac {\mu }{\sigma }\right ] = \frac {1}{\sigma }\Exp [X] - \frac {\mu }{\sigma }\] Since we are given $\Exp [X] = \mu $: \[\Exp [Z] = \frac {\mu }{\sigma } - \frac {\mu }{\sigma } = {0}\]
For the variance, we know the property $\Var (aX + b) = a^2 \Var (X)$. \[\Var (Z) = \Var \left (\dfrac {X-\mu }{\sigma }\right ) = \Var \left (\frac {1}{\sigma }X - \frac {\mu }{\sigma }\right ) = \left (\frac {1}{\sigma }\right )^2 \Var (X)\] Since we are given $\Var (X) = \sigma ^2$: \[\Var (Z) = \frac {1}{\sigma ^2} (\sigma ^2) = {1}\] *(This is why it’s called standardized: it shifts the distribution to be centered exactly at 0, with a clean variance of exactly 1!)*
c): Using Linearity of Expectation: \[\Exp [X-3Y-5] = {\Exp [X] - 3\Exp [Y] - 5}\]
For variance, we know that the variance of a sum of independent random variables is the sum of their variances, and that $\Var (aX+b) = a^2\Var (X)$. \[\Var (X-3Y-5) = \Var (X) + \Var (-3Y) = {\Var (X) + 9\Var (Y)}\]
d): Using Linearity of Expectation (which holds regardless of independence): \[\Exp [\overline {X}] = \Exp \left [\dfrac {1}{n}\sum _{i=1}^n{X_i}\right ] = \dfrac {1}{n}\sum _{i=1}^n\Exp [X_i] = \dfrac {1}{n} (n\mu ) = {\mu }\]
Using the properties of variance, and specifically noting that because the $X_i$’s are independent, the variance of their sum is the sum of their variances: \[\Var (\overline {X}) = \Var \left (\dfrac {1}{n}\sum _{i=1}^n{X_i}\right ) = \dfrac {1}{n^2}\Var \left (\sum _{i=1}^n{X_i}\right ) \overset {\textbf {indep.}}{=} \dfrac {1}{n^2}\sum _{i=1}^n{\Var (X_i)} = \dfrac {1}{n^2}(n\sigma ^2) = {\dfrac {\sigma ^2}{n}}\]

12 Coin Flipping

Suppose we have a coin with probability $p$ of heads. Suppose we flip this coin until we flip a head for the first time. Let $X$ be the number of times we flip the coin up to and including the first head. What is $\Pr ( X = k ) $, for $k = 1,2,\ldots $? Verify that $\sum _{k = 1}^{\infty }{\Pr (X = k)} = 1$, as it should. (You may use the fact that $\sum _{j = 0}^{\infty }a^{j} = \frac {1}{1 - a}$ for $| a | < 1$). (This is called a geometric random variable.)

If the $k^{\text {th}}$ flip is our first head, the first $k - 1$ flips must be tails (each with probability $(1-p)$), and the $k^{\text {th}}$ flip must be a head (which occurs with probability $p$). Therefore:

\[\Pr ( X = k ) = {( 1 - p ) ^{k - 1}p}\]

To verify the probabilities sum to 1, we pull the constant $p$ out of the summation: \[\sum _{k = 1}^{\infty }{\Pr (X = k)} = \sum _{k = 1}^{\infty }{ ( 1 - p ) ^{k - 1}p} = p \cdot \sum _{k = 1}^{\infty } ( 1 - p ) ^{k - 1}\]

We can re-index the summation by setting $j = k - 1$ (so our lower bound $k = 1$ turns into $j = 0$): \[p \cdot \sum _{j = 0}^{\infty } ( 1 - p ) ^{j} = p \left (\frac {1}{1 - (1 - p)}\right ) = p \left (\frac {1}{p}\right ) = {1}\]

13 More Coin Flipping ...

Suppose we have a coin with probability $p$ of heads. Suppose we flip this coin $n$ times independently. Let $X$ be the number of heads that we observe. What is $\Pr ( X = k ) $, for $k = 0,\ldots n$? Verify that $\sum _{k = 0}^{n}{\Pr (X = k)} = 1$, as it should. (This is called a "binomial" random variable.)

The answer is: \[\Pr ( X = k ) = {\binom {n}{k} p^{k} ( 1 - p ) ^{n - k}}\]

To see why, observe that for a specific, given sequence of flips containing exactly $k$ heads, the probability of that sequence occurring is $p^{k} ( 1 - p ) ^{n - k}$. Since there are exactly $\binom {n}{k}$ such possible valid sequences, the total probability of observing exactly $k$ heads is $ \binom {n}{k} p^{k} ( 1 - p ) ^{n - k}$.

To verify the sum: \[\sum _{k = 0}^{n}{\Pr (X = k)} = \sum _{k = 0}^{n}{\binom {n}{k} p^{k} ( 1 - p ) ^{n - k}}\] By the Binomial Theorem, this expression factors perfectly to: \[ ( p + (1 - p) ) ^{n} = (1)^n = {1}\]