CSE 312 – Section 5 Solutions

Spring 2026

Review of Main Concepts

Independence: Random variable $X$ and event $E$ are independent iff \[\forall x ,\quad \Pr ( X = x \cap E) = \Pr ( X = x)\Pr (E)\]

Random variables $X$ and $Y$ are independent iff \[\forall x \forall y,\quad \Pr ( X = x \cap Y = y) = \Pr ( X = x)\Pr (Y = y)\] In this case, we have $\Exp [XY]= \Exp [X]\Exp [Y]$ (the converse is not necessarily true).
i.i.d. (independent and identically distributed): Random variables $X_{1},\ldots ,X_{n}$ are i.i.d. (or iid) iff they are mutually independent and have the same probability mass function.
Independence of functions of a r.v.: If $X$ and $Y$ are independent and $g(\cdot ),h(\cdot )$ are functions mapping real numbers to real numbers, then $g(X)$ and $h(Y)$ are independent. (See if you can prove this!) Thus, for example: if $X$ and $Y$ are independent, then so are $X^2$ and $Y^2$.
Variance of Independent Variables: If $X$ and $Y$ are independent random variables, $\Var \left ( X + Y \right ) = \Var \left ( X \right ) + \Var (Y)$. Linearity of variance does not hold in general. It depends on independence, whereas linearity of expectation always holds. Note that this combined with the above shows that $\forall a,b,c\ \mathbb {\in R}$ and if $X$ is independent of $Y$, $\Var (aX + bY + c) =a^{2}\Var (X) + b^{2}\Var (Y)$.
Review: Zoo of Discrete Random Variables

a)
Uniform: $X\sim \textsf {Uniform}(a,b)$ ($\textsf {Unif}(a,b)$ for short), for integers $a \leq b$, iff $X$ has the following probability mass function: \[p_{X}\left ( k \right ) = \frac {1}{b - a + 1},\ \ k = a, a+1, \ldots ,b\] $\Exp [X]= \frac {a + b}{2}$ and $\Var (X) = \frac {\left ( b - a \right )\left ( b - a + 2 \right )}{12}$. This represents a random experiment in which each outcome from $\lbrack a,b\rbrack $ is equally likely. For example, a single roll of a fair die is $\textsf {Uniform}(1,6)$.

b)
Bernoulli (or indicator): $X\sim \textsf {Bernoulli}(p)$ ($\textsf {Ber}(p)$ for short) iff $X$ has the following probability mass function: \[p_{X}\left ( k \right ) = \left \{ \begin {matrix} p,\ \ & k = 1 \\ 1 - p,\ \ & k = 0 \\ \end {matrix} \right .\ \] $\Exp [ X]= p$ and $\Var (X) = p(1 - p)$. An example of a Bernoulli r.v. is one flip of a coin with $\Pr \left ( \text {head} \right ) = p$.

c)
Binomial: $X\sim \textsf {Binomial}(n,p)$ ($\textsf {Bin}(n,p)$ for short) iff $X$ is the sum of $n$ iid $\textsf {Bernoulli}(p)$ random variables. $X$ has probability mass function \[p_{X}\left ( k \right ) = \binom {n}{k} p^{k}\left ( 1 - p \right )^{n - k},\ \ k = 0,1,\ldots ,n\] $\Exp [X]= np$ and $\Var (X) = np(1 - p)$. An example of a Binomial r.v. is the number of heads in $n$ independent flips of a coin with $\Pr \left ( \text {head} \right ) = p$. Note that $\textsf {Bin}(1,p) \equiv \textsf {Ber}(p)$. As $n \rightarrow \infty \ \text {and}\ p \rightarrow 0,\text {with}\ np\ = \lambda $, then $\textsf {Bin}\left ( n,p \right ) \rightarrow \textsf {Poi}(\lambda )$. If $X_{1},\ldots ,X_{n}$ are independent Binomial r.v.’s, where $X_{i}\sim \textsf {Bin}(N_{i},p)$, then $X = X_{1} + \ldots + X_{n}\sim \textsf {Bin}(N_{1} + \ldots + N_{n},p)$.

d)
Geometric: $X\sim \textsf {Geometric}(p)$ ($\textsf {Geo}(p)$ for short) iff $X$ has the following probability mass function: \[p_{X}\left ( k \right ) = \left ( 1 - p \right )^{k - 1}p,\ \ k = 1,2,\ldots \] $\Exp [X] = \frac {1}{p}$ and $\Var (X) = \frac {1 - p}{p^{2}}$. An example of a Geometric r.v. is the number of independent coin flips up to and including the first head, where $\Pr \left ( \text {head} \right ) = p$.

e)
Poisson: $X\sim \textsf {Poisson}(\lambda )$ ($\textsf {Poi}(\lambda )$ for short) iff $X$ has the following probability mass function: \[p_{X}\left ( k \right ) = e^{- \lambda }\frac {\lambda ^{k}}{k!},\ \ k = 0,1,\ldots \] $\Exp [X] = \lambda $ and $\Var (X) = \lambda $. An example of a Poisson r.v. is the number of people born during a particular minute, where $\lambda $ is the average birth rate per minute. If $X_{1},\ldots ,X_{n}$ are independent Poisson r.v.’s, where $X_{i}\sim \textsf {Poi}(\lambda _{i})$, then $X = X_{1} + \ldots + X_{n}\sim \textsf {Poi}(\lambda _{1} + \ldots + \lambda _{n})$.

Announcements & Plan for Section

Announcements

PSet 3 grades were released and can be viewed on Gradescope. Regrade requests will close on 5/2. We highly recommend taking a look through feedback on the psets, the common errors on Ed and the solutions that were posted (link on Ed).
PSet 4 was due earlier this week.
PSet 5 is released — start early.
This week’s focus: Random Variable Zoo + applications, independence and more practice with expectation/variance.
A practice midterm will be posted today or tomorrow. Please attempt the problems before next week’s section where we will review the solutions.

Plan for Section

Content Review (Problem 1)
Quick review of concepts such as variance properties, linearity of expectation and independence.
Problem 2 (Pond Fishing)
Build intuition for the different zoo variables and what each of them represent.
Problem 4 (Variance of a Product)
Gives students practice with the definition of variance, and understanding when linearity of variance holds.
Problem 5 (True or False?)
Exam-style true or false questions to strengthen the definitions of zoo variables.
Problem 3 (Best Coach Ever!! - if time)
Gives students intuition for when to apply which zoo variable, practices geometric and binomial variables.

1 Content Review Questions

a)

True or false: $\Var (X + Y) = \Var (X) + \Var (Y)$ for all random variables $X$ and $Y$

False. This property only holds if $X$ and $Y$ are independent.

b)

What is $\Var (3X + 4)$?

$3\Var (X) + 4$
$3\Var (X)$
$9\Var (X)$
$\Var (X)$

$9\Var (X)$ by the scaling property of variance

c)

True or false: $\Exp [X + Y] = \Exp [X] + \Exp [Y]$ for all random variables $X$ and $Y$.

True. This is by the linearity of expectation. $X$ and $Y$ do not have to be independent.

d)

What is $\Exp [3X + 4]$?

$3\Exp [X] + 4$
$3\Exp [X]$
$9\Exp [X]$
$\Exp [X]$

$3\Exp [X] + 4$ by the linearity of expectation.

2 Pond Fishing

Suppose I am fishing in a pond with $B$ blue fish, $R$ red fish, and $G$ green fish, where $B + R + G = N$. Each fish is equally likely to be caught. For each of the following scenarios, identify the most appropriate distribution (with parameter(s)):

a)

How many of the next 10 fish I catch are blue, if I catch and release them

\[\textsf {Bin}\left ( 10,\frac {B}{N} \right )\]
\[\textsf {Ber}\left ( \frac {B}{N} \right )\]
\[\textsf {Bin}\left ( 1,\frac {B}{N} \right )\]

Since this is the same as saying how many of my next 10 trials (fish) are a success (are blue), this is a binomial distribution. Specifically, since we are doing catch and release, the probability of a given fish being blue is $\frac {B}{N}$ and each trial is independent. Thus: \[\textsf {Bin}\left ( 10,\frac {B}{N} \right )\]

b)

How many fish I had to catch until my first green fish, if I catch and release them

\[\textsf {Ber}\left ( \frac {G}{N} \right )\]
\[\textsf {Bin}\left ( 1,\frac {G}{N} \right )\]
\[\textsf {Geo}\left ( \frac {G}{N} \right )\]

Once again, each catch is independent, so this is asking how many trials until we see a success, hence it is a geometric distribution: \[\textsf {Geo}\left ( \frac {G}{N} \right )\]

c)

How many red fish I catch in the next five minutes, if I catch on average $r$ red fish per minute

\[\textsf {Poi}(5R)\]
\[\textsf {Bin}\left ( 5,\frac {R}{N} \right )\]
\[\textsf {Poi}(5r)\]

This is asking for the number of occurrences of an event given an average rate, which is the definition of the Poisson distribution. Since we’re looking for events in the next 5 minutes, that is our time unit, so we have to adjust the average rate to match ($r$ per minute becomes $5r$ per 5 minutes). \[\textsf {Poi}(5r)\]

d)

Whether or not my next fish is blue

\[\textsf {Poi}(5B)\]
\[\textsf {Bin}\left ( 1,\frac {R}{N} \right )\]
\[\textsf {Ber}\left ( \frac {B}{N} \right )\]

This is the same as the binomial case, but it’s only one trial, so it is necessarily Bernoulli. \[\textsf {Ber}\left ( \frac {B}{N} \right )\]

3 Best Coach Ever!!

You are a hardworking boxer. Your coach tells you that the probability of your winning a boxing match is $0.2$ independently of every other match.

a): How many matches do you expect to fight until you win 10 times and what kind of random variable is this?

The number of matches you have to fight until you win 10 times can be modeled by $\sum _{i=1}^{10}X_{i}$ where $X_{i} \sim \textsf {Geometric}(0.2)$ is the number of matches you have to fight to go from $i-1$ wins to $i$ wins, including the match that gets you your $i^{th}$ win, where every match has a 0.2 probability of success. Recall $\Exp [X_i]=\frac {1}{0.2}=5$. $\Exp [\sum _{i=1}^{10}X_{i}] = \sum _{i=1}^{10}\Exp [X_{i}] = \sum _{i}^{10}\frac {1}{0.2} = 10 \cdot 5 = 50$.
b): You only get to play 12 matches every year. To win a spot in the Annual Boxing Championship, a boxer needs to win at least 10 matches in a year. What is the probability that you will go to the Championship this year and what kind of random variable is the number of matches you win out of the 12?

You can go to the championship if you win more than or equal to 10 times this year. Let $Y$ be the number of matches you win out of the 12 matches. Note that $Y \sim \textsf {Binomial}(12, 0.2)$. Since the max number you can win is 12 (there are 12 matches), we are looking for $\Pr (10 \le Y \le 12)$. Thus, since $Y$ is discrete, we are interested in \[\Pr (Y = 10) + \Pr (Y = 11) + \Pr (Y = 12) = \sum _{i = 10}^{12}{\dbinom {12}{i} 0.2^{i} (1 - 0.2)^{12 - i} }\]
c): Let $p$ be your answer to part (b). How many times can you expect to go to the Championship in your 20 year career?

The number of times you go to the championship can be modeled by $Y \sim \textsf {Binomial}(20, p)$ as whether or not we go to the championship every year is independent of other years, and since we are trying to find how many of these $20$ trials is a success. So, $\Exp [Y] = 20 \cdot p$.

4 Variance of a Product

Let $X, Y, Z$ be independent random variables with means $\mu _{X}, \mu _{Y}, \mu _{Z}$ and variances $\sigma _{X}^{2}, \sigma _{Y}^{2}, \sigma _{Z}^{2}$, respectively. Find $\Var (XY - Z)$.

First recall that $\Var (X)=\Exp [X^2]-\Exp [X]^2\implies \Exp [X^2]=\Var (X)+\Exp [X]^2=\sigma _X^2+\mu _X^2$, and same for $Y$. \[\Var (XY) = \Exp [X^{2} Y^{2}] - \Exp [XY]^2 \quad (\text {by theorem in class})\] \[ = \Exp [X^{2}] \Exp [Y^{2}] - (\Exp [X]\Exp [Y])^2 \quad (\text {by independence})\] \[ = \Exp [X^{2}] \Exp [Y^{2}] - \Exp [X]^2\Exp [Y]^2\] \[ = (\sigma _X^2+\mu _X^2)(\sigma _Y^2+\mu _Y^2)-\mu _X^2\mu _Y^2\]

By independence, \[\Var (XY - Z) = \Var (XY) + \Var (-Z) = \Var (XY) + \Var (Z)\] \[=(\sigma _X^2+\mu _X^2)(\sigma _Y^2+\mu _Y^2)-\mu _X^2\mu _Y^2+\sigma _Z^2\]

5 True or False?

Identify the following statements as true or false (true means always true). Justify your answer.

a): For any random variable $X$, we have $\Exp [X^2]\ge \Exp [X]^2$.

True, since $0\le \Var (X)=\Exp [(X - \Exp [X])^2]$, since the squaring necessitates the result is non-negative.
b): Let $X,Y$ be random variables. Then, $X$ and $Y$ are independent if and only if $\Exp [XY]=\Exp [X]\Exp [Y]$.

False. The forward implication is true, but the reverse is not. For example, if $X$ is the discrete uniform random variable on the set $\{-1,0,1\}$ such that $\Pr (X=-1)=\Pr (X=0)=\Pr (X=1)=\frac {1}{3}$, and $Y=X^2$, we have $\Exp [X]=0$, so $\Exp [X]\Exp [Y]=0$. However, since $X=X^3$, $\Exp [XY]=\Exp [XX^2]=\Exp [X^3]=\Exp [X]=0$, we have that $\Exp [X]\Exp [Y]=0=\Exp [XY]$. However, $X$ and $Y$ are not independent; indeed, $\Pr (Y=0 \mid X=0)=1\ne \frac {1}{3}=\Pr (Y=0)$.
c): Let $X\sim \textsf {Binomial}(n,p)$ and $Y\sim \textsf {Binomial}(m,p)$ be independent. Then, $X+Y\sim \textsf {Binomial}(n+m,p)$.

True. $X$ is the sum of $n$ independent Bernoulli trials, and $Y$ is the sum of $m$. So $X+Y$ is the sum of $n+m$ independent Bernoulli trials, so $X+Y\sim \textsf {Binomial}(n+m,p)$.
d): Let $X_1,...,X_{n+1}$ be independent $\textsf {Bernoulli}(p)$ random variables. Then, $\Exp [\sum _{i=1}^n{X_iX_{i+1}}]=np^2$.

True. Notice that $X_iX_{i+1}$ is also Bernoulli (only takes on $0$ and $1$), but is $1$ iff both are $1$, so $X_iX_{i+1}\sim \textsf {Bernoulli}(p^2)$. The statement holds by linearity, since $\Exp [X_iX_{i+1}]=p^2$.
e): Let $X_1,...,X_{n+1}$ be independent $\textsf {Bernoulli}(p)$ random variables. Then, $Y=\sum _{i=1}^n{X_iX_{i+1}}\sim \textsf {Binomial}(n,p^2)$.

False. They are all Bernoulli($p^2$) as determined in the previous part, but they are not independent. Indeed, $\Pr (X_1X_2=1 \mid X_2X_3=1)=\Pr (X_1=1)=p\ne p^2=\Pr (X_1X_2=1)$.
f): If $X\sim \textsf {Bernoulli}(p)$, then $nX\sim \textsf {Binomial}(n,p)$.

False. The range of $X$ is $\{0,1\}$, so the range of $nX$ is $\{0,n\}$. $nX$ cannot be $\textsf {Bin}(n,p)$, otherwise its range would be $\{0,1,...,n\}$.
g): If $X\sim \textsf {Binomial}(n,p)$, then $\frac {X}{n}\sim \textsf {Bernoulli}(p)$.

False. Again, the range of $X$ is $\{0,1,...,n\}$, so the range of $\frac {X}{n}$ is $\{0,\frac {1}{n},\frac {2}{n},...,1\}$. Hence it cannot be $\textsf {Ber}(p)$, otherwise its range would be $\{0,1\}$.
h): For any two independent random variables $X,Y$, we have $\Var (X-Y)=\Var (X)-\Var (Y)$.

False. $\Var (X-Y)=\Var (X+(-Y))=\Var (X)+(-1)^2\Var (Y)=\Var (X)+\Var (Y)$.

6 Fun with Poissons

Let $X \sim \textsf {Poisson}(\lambda _1)$ and $Y\sim \textsf {Poisson}(\lambda _2)$, and $X$ and $Y$ are independent.

a): Show that $X+Y \sim \textsf {Poisson}(\lambda _1 + \lambda _2)$. (We did this problem in class.)

To show that a random variable is distributed according to a particular distribution, we must show that they have the same PMF. Thus, we are trying to show that $\Pr (X+Y = n) = e^{-(\lambda _1 + \lambda _2)}\frac {(\lambda _1 + \lambda _2)^n}{n!}$ \[ \begin {aligned} \Pr (X + Y = n) &= \sum ^n_{k=0}\Pr (X=k \cap Y = n - k)\\ &= \sum ^n_{k=0}\Pr (X=k) \Pr (Y = n - k) \quad &[\text {X and Y are independent}]\\ &= \sum ^n_{k=0}e^{-\lambda _1} \frac {\lambda _1^k}{k!}\; e^{-\lambda _2} \frac {\lambda _2^{n-k}}{(n-k)!}\\ &= e^{-(\lambda _1 + \lambda _2)} \sum ^n_{k=0} \frac {\lambda _1^k}{k!}\; \frac {\lambda _2^{n-k}}{(n-k)!}\\ &= e^{-(\lambda _1 + \lambda _2)} \sum ^n_{k=0} \frac {1}{k!(n-k)!}\; \lambda _1^k \lambda _2^{n-k}\\ &= \frac {e^{-(\lambda _1 + \lambda _2)}}{n!} \sum ^n_{k=0} \frac {n!}{k!(n-k)!}\; \lambda _1^k \lambda _2^{n-k}\\ &= \frac {e^{-(\lambda _1 + \lambda _2)}}{n!} \sum ^n_{k=0} \binom {n}{k} \; \lambda _1^k \lambda _2^{n-k}\\ &= \frac {e^{-(\lambda _1 + \lambda _2)}}{n!} \; (\lambda _1 + \lambda _2)^n \quad &[\text {Binomial Theorem}] \end {aligned} \]
b): Show that $\Pr (X = k \mid X+Y = n) = \Pr (W = k)$ where $W \sim \textsf {Bin}(n, \frac {\lambda _1}{\lambda _1 + \lambda _2})$

\[ \begin {aligned} \Pr (X = k \mid X+Y = n) &= \frac {\Pr (X=k \cap X+Y = n)}{\Pr (X+Y = n)}\\ &= \frac {\Pr (X=k \cap Y = n - k)}{\Pr (X+Y = n)}\\ &= \frac {\Pr (X=k) \Pr ( Y = n - k)}{\Pr (X+Y = n)} \quad \quad [\text {X and Y are independent}]\\ &= \frac {e^{-\lambda _1} \frac {\lambda _1^k}{k!} \cdot e^{-\lambda _2} \frac {\lambda _2^{n-k}}{(n-k)!}}{e^{-(\lambda _1 + \lambda _2)} \frac {(\lambda _1 + \lambda _2)^n}{n!}}\\ &= \frac {\frac {\lambda _1^k}{k!} \cdot \frac {\lambda _2^{n-k}}{(n-k)!}}{\frac {(\lambda _1 + \lambda _2)^n}{n!}}\\ &= \frac {n!}{k!(n-k)!} \cdot \frac {\lambda _1^k \lambda _2^{n-k}}{(\lambda _1 + \lambda _2)^n}\\ &= \binom {n}{k} \frac {\lambda _1^k \; \lambda _2^{n-k}}{(\lambda _1 + \lambda _2)^k \; (\lambda _1 + \lambda _2)^{n-k}}\\ &= \binom {n}{k} \; \left ( \frac {\lambda _1}{\lambda _1 + \lambda _2} \right )^{k} \; \left ( \frac {\lambda _2}{\lambda _1 + \lambda _2} \right )^{n-k}\\ &= \binom {n}{k} p^k \; (1-p)^{n-k}\text { , where } p = \frac {\lambda _1}{\lambda _1 + \lambda _2} \end {aligned} \]

7 Memorylessness

We say that a random variable $X$ is memoryless if $\Pr (X > k + i \mid X > k) = \Pr (X > i)$ for all non-negative integers $k$ and $ i$. The idea is that $X$ does not remember its history. Let $X\sim \textsf {Geo}(p)$. Show that $X$ is memoryless.

Let’s note that if $X \sim \textsf {Geo}(p)$, then $\Pr (X > k) = \Pr (\text {no successes in the first $k$ trials}) = (1-p)^{k}$. \[ \begin {aligned} \Pr (X > k + i \mid X > k) &= \frac {\Pr (X > k \mid X > k + i) \; \Pr (X > k + i)}{\Pr (X > k)} \quad &[\text {Defn of Cond. Probability}]\\ &= \frac {\Pr (X > k + i)}{\Pr (X > k)} \quad &[\Pr (X > k \mid X > k + i) = 1]\\ &= \frac {(1-p)^{k+i}}{(1-p)^{k}} \quad &[\Pr (X > k) = (1-p)^{k}]\\ &= (1-p)^i \\ &= \Pr (X > i) \end {aligned} \]

8 Poisson Practice

Seattle averages 3 days with snowfall per year. Suppose the number of days with snowfall follows a Poisson distribution.

a): What is the probability of getting exactly 5 days of snow in a year?

Let $X\sim \textsf {Poi}(3)$ Then $p_X(5)=\frac {3^5e^{-3}}{5!} \approx .1008$
b): According to the Poisson model, what is the probability of getting 367 days of snow?

Let $X\sim \textsf {Poi}(3)$. Then $p_X(367)=\frac {3^{367}e^{-3}}{367!} \approx 1.8\times 10^{-610}$, that’s a very small estimate, but of course the true probability is 0. Recall that using a Poisson distribution is a modeling assumption, it may produce nonzero probabilities for events that are practically impossible.

9 How many 6’s?

Suppose that a fair 8-sided die is rolled repeatedly, with each roll independent of the others. Let $Z$ be the number of rolls until (and including) the first time either a 2 or a 3 is rolled, and let $W$ be the number of 6’s rolled until the first 2 or 3 is rolled. So, for example if the sequence of die values until the first 2 or 3 is 6,5,4,8,7,6,7,1,2, then $Z$ is 9 and $W$ is 2.

Define \[p(j) := \begin {cases} \Pr (W = j \mid Z = i)& j \in \{0, 1, \ldots , i-1\}\\ 0 & \text {otherwise} \end {cases}\] Show that $p(j)$ is the probability mass function of a binomially distributed random variable and determine its parameters $n$ and $p$. Prove that your answer is correct mathematically.

\begin {equation*} \begin {aligned} \Pr (W = j \mid Z = i) & = \frac { \Pr (W = j, Z = i)}{\Pr (Z=i)}\notag \\ & = \frac {\binom {i-1}{j}\left (\frac {1}{8}\right )^j \left (\frac {5}{8}\right )^{i-1-j} \frac {2}{8}} {\left (\frac {6}{8}\right )^{i-1} \frac {2}{8}}\label {explain}\\ & = \frac {\binom {i-1 } { j}\left (\frac {1}{8}\right )^j \left (\frac {5}{8}\right )^{i-1-j} } {\left (\frac {6}{8}\right )^j \left (\frac {6}{8}\right )^{i-1 - j } }\notag \\ & = \binom {i-1} {j}\left (\frac {1}{6}\right )^j \left (\frac {5}{6}\right )^{i-1-j} \notag \end {aligned} \end {equation*} since all the 8’s in the denominators (on top and bottom) cancel out. Note that the numerator of (??) is the probability of the event that we have $j$ 6’s in the first $i-1$ rolls (since $Z=i$ tells us that the $i$th roll must be a 2 or 3). Each particular sequence where $j$ of the first $i-1$ rolls are 6’s, $i-j-1$ of the first $i-1$ rolls are 1,4,5,7 or 8 and the final roll is a 2 or 3 has probability \[\Pr (W = j, Z = i) = \left (\frac {1}{8}\right )^j \left (\frac {5}{8}\right )^{i-1-j} \frac {2}{8}\] and there are $\binom {i-1}{j}$ such sequences.

We can re-write the last line of our working above as \[\Pr (W = j \mid Z = i) = \binom {i - 1}{j} \left (\dfrac {1}{6}\right )^j \left (1 - \dfrac {1}{6}\right )^{i - 1 - j}\] which is a binomial random variable with $p = \dfrac {1}{6}, n = i - 1$.

10 Practice with LOTUS

Suppose that $X$ is a Binomial random variable with parameters $n$ and $p$. Use LOTUS to show that \[\Exp \left [\frac {1}{X+1}\right ] = \frac {1- (1-p)^{n+1}}{(n+1)p}.\] Hint: Use the result of Problem 2 from Section 2, namely that $\frac {1}{k+1} \binom {n}{k} = \frac {1}{n+1}\binom {n+1}{k+1} $ and then manipulate the expression until you can use the Binomial Theorem.

By LOTUS, the expected value of a function of a random variable $g(X)$ is $\Exp [g(X)] = \sum _{k} g(k) \Pr (X = k)$. Applying this to $g(X) = \frac {1}{X+1}$, we get: \[ \begin {aligned} \Exp \left [\frac {1}{X+1}\right ] &= \sum _{k=0}^{n} \frac {1}{k+1} \Pr (X = k) \\ &= \sum _{k=0}^{n} \frac {1}{k+1} \binom {n}{k} p^k (1-p)^{n-k} \end {aligned} \] Using the hint, we substitute $\frac {1}{k+1} \binom {n}{k} = \frac {1}{n+1} \binom {n+1}{k+1}$: \[ \begin {aligned} \Exp \left [\frac {1}{X+1}\right ] &= \sum _{k=0}^{n} \frac {1}{n+1} \binom {n+1}{k+1} p^k (1-p)^{n-k} \end {aligned} \] We can factor out $\frac {1}{n+1}$. To make the exponent of $p$ match the bottom term in the binomial coefficient ($k+1$), we multiply and divide the expression by $p$: \[ \begin {aligned} \Exp \left [\frac {1}{X+1}\right ] &= \frac {1}{(n+1)p} \sum _{k=0}^{n} \binom {n+1}{k+1} p^{k+1} (1-p)^{n-k} \end {aligned} \] Next, we perform a change of variables. Let $j = k+1$. When $k$ goes from $0$ to $n$, $j$ goes from $1$ to $n+1$. Notice that $n-k = n-(j-1) = n+1-j$. Substituting $j$ into our summation: \[ \begin {aligned} \Exp \left [\frac {1}{X+1}\right ] &= \frac {1}{(n+1)p} \sum _{j=1}^{n+1} \binom {n+1}{j} p^j (1-p)^{n+1-j} \end {aligned} \] By the Binomial Theorem, we know that $\sum _{j=0}^{n+1} \binom {n+1}{j} p^j (1-p)^{n+1-j} = (p + (1-p))^{n+1} = 1$.

Our summation is almost exactly this Binomial expansion, except it is missing the $j=0$ term. We can rewrite our summation as the full Binomial expansion minus that $j=0$ term: \[ \begin {aligned} \sum _{j=1}^{n+1} \binom {n+1}{j} p^j (1-p)^{n+1-j} &= \sum _{j=0}^{n+1} \left [ \binom {n+1}{j} p^j (1-p)^{n+1-j} \right ] - \binom {n+1}{0} p^0 (1-p)^{n+1-0} \\ &= 1 - (1-p)^{n+1} \end {aligned} \] Substituting this back into our expected value equation yields the final result: \[ \begin {aligned} \Exp \left [\frac {1}{X+1}\right ] &= \frac {1}{(n+1)p} \left ( 1 - (1-p)^{n+1} \right ) = \frac {1- (1-p)^{n+1}}{(n+1)p} \end {aligned} \]

11 Parallel Server Failures

A computing system relies on $m$ independent nodes. The lifetime (in days) of each node is modeled by a Geometric distribution with parameter $p$ (meaning each node has a probability $p$ of failing on any given day). The system completely shuts down only when all $m$ nodes have failed.

a): Let $D$ be the day the first node fails. Find the probability mass function of $D$. Start by computing the probability that $D > d$.
b): Find the exact probability that the entire system shuts down on day $k$.

a): First Failure (Minimum): Let $X_i$ be the lifetime of node $i$. We want the distribution of $D = \min (X_1, X_2, \dots , X_m)$. The event $D > d$ means all $m$ nodes survive past day $d$. \[ \begin {aligned} \Pr (D > d) &= \Pr (X_1 > d, X_2 > d, \dots , X_m > d) \\ &= \prod _{i=1}^m \Pr (X_i > d) \end {aligned}. \] Since $\Pr (X_i > d) = (1-p)^d$: \[ \begin {aligned} \Pr (D > d) &= \left ((1-p)^d\right )^m = \left ((1-p)^m\right )^d \end {aligned}. \] This represents the tail probability of a new Geometric distribution since it has the form $(\text {probability of no failure in one day})^{d}$. In particular, on any given day, all $m$ nodes survive with probability $(1-p)^{m}$, so the probability that at least one node fails on a given day is \[ 1 - (1-p)^{m} \] Thus, we can think of each day as an independent trial where the probability of “success” (failure of at least one machine) is $1 - (1-p)^m$. Therefore, $D \sim \text {Geometric}(1 - (1-p)^m)$.
b): System Shutdown (Maximum): Let $Y$ be the day the entire system shuts down, so $Y = \max (X_1, X_2, \dots , X_m)$. The probability that all nodes fail by day $k$ is the cumulative distribution function $F_Y(k)$: \[ \begin {aligned} F_Y(k) &= \Pr (\max (X_1, X_2, \dots , X_m) \le k) \\ &= \Pr (X_1 \le k, X_2 \le k, \ldots , X_m \le k)\\ &= \prod _{i=1}^m \Pr (X_i \le k) \\ &= [1 - (1-p)^k]^m \end {aligned}. \] The probability the system shuts down exactly on day $k$ requires subtracting the probability that it shut down on any day prior: \[ \begin {aligned} \Pr (Y = k) &= \Pr (Y \le k) - \Pr (Y \le k-1) \\ &= [1 - (1-p)^k]^m - [1 - (1-p)^{k-1}]^m \end {aligned}. \]

12 Grading time

Suppose that homeworks are graded by TA1 with probability 0.3, by TA2 with probability 0.5 and by TA3 with probability 0.2.

a): TA1 takes an amount of time (in hours) to grade that is Poisson with parameter $\lambda $.
b): TA2 takes an amount of time (in hours) to grade that is Binomial with parameters $n$ and $p$.
c): TA3 takes an amount of time (in hours) to grade that is Geometric with parameter $p$.

What is the probability that TA1 did the grading given that the number of hours it took was $h$?

Let $H$ be the number of hours it took to grade. By Bayes Theorem, \[ \begin {aligned} \Pr (\text {TA1 graded} |H=h) &=\frac {\Pr (H=h |\text { TA1 graded})\cdot \Pr (\text {TA1 graded})}{\Pr (H=h)}\\ &= \frac {e^{\lambda }\frac {\lambda ^h}{h!} \cdot 0.3 }{\Pr (H=h)}\\ &=\frac {e^{\lambda }\frac {\lambda ^h}{h!}\cdot 0.3 }{0.3\cdot e^{\lambda }\frac {\lambda ^h}{h!} + 0.5 \cdot \binom {n}{h} p^h (1-p)^{n-h} + 0.2\cdot (1-p)^{h-1}p}, \end {aligned} \] where we used the Law of Total Probability to compute the denominator in the last step.