CSE 312 – Section 9


Spring 2026

Review of Main Concepts

Law of Total Expectation

Maximum Likelihood Estimation

We assume access to samples \(X_1, X_2, \ldots , X_n\) from a distribution (e.g. Bernoulli, Geometric, Normal, etc) with unknown parameters. Our goal is to estimate this parameters. When the parameter is unknown we denote it by \(\theta \).

Tail bounds ++

Section plan

1  Content Review I

a)
Law of Total Expectation. Let \(X\) and \(Y\) be discrete random variables. Which of the following represents the Law of Total Expectation for \(X\)?
  • \(\Exp [X] = \sum _y \Exp [X \mid Y=y] p_X(x)\)
  • \(\Exp [X] = \sum _y \Exp [X \mid Y=y] p_Y(y)\)
  • \(\Exp [X] = \sum _x \Exp [X \mid Y=y] p_Y(y)\)
  • \(\Exp [X] = \sum _y \Exp [X] p_{X|Y}(x|y)\)
b)
Linearity of Conditional Expectation. True or False: Linearity of expectation does not apply to conditional expectations. That is, \(\Exp [X+Y \mid A]\) is generally NOT equal to \(\Exp [X \mid A] + \Exp [Y \mid A]\).
  • True
  • False
c)
Suppose \(X\) and \(Y\) are random variables and \(A\) is an event. Given that \(\expect {X|A} = 4\) and \(\expect {Y|A} = 10\), what is \(\expect {2X+\frac {Y}{2} \:\middle |\: A}\)?
  • 14
  • 18
  • 9
  • 13
d)
True or False: We get a different answer if we maximize log-likelihood rather than likelihood, but because it is close enough and easier to compute, we use it for our estimate of \(\theta \).
e)
True or False: [Notation question] \(\hat {\theta }\) is the true parameter and \(\theta \) is our estimate.
f)
True or False: An estimator is unbiased if \(\text {Bias}( \hat {\theta },\theta ) = \mathbb {E}[ \hat {\theta }(X_1,\dots ,X_n)] - \theta \) = 0 or equivalently \(\mathbb {E}[\hat {\theta }(X_1, \ldots , X_n)] = \theta \)
g)
You flip a coin 10 times and observe HHHTHHTHHH (8 heads, 2 tails). What is the MLE of \(\theta \), where \(\theta \) is the true probability of seeing tails?
  • \(\hat {\theta } = .2\)
  • \(\hat {\theta } = .25\)
  • \(\hat {\theta } = .8\)
  • \(\hat {\theta } = .3\)
h)
True or False: The Union Bound always gives a result in \([0,1]\).

2  Content Review II (material not properly covered in class)

a)
Suppose \(C\) and \(D\) are discrete random variables. Then \(\expect {C|D = d} =\)
  • \(\sum _d dp_{D|C} (d|c)\)
  • \(\sum _c cp_{C|D} (c|d)\)
  • \(\int _{-\infty }^{\infty } cf_{C|D} \dif x\)
  • \(\frac {\expect {C}}{\expect {D}}\)
b)
True or false: Markov’s Inequality always gives a non-negative result.
c)
True or false: Chebyshev’s Inequality can best be described as giving an upper bound on the distribution’s right tail.

3  Trapped Miner

A miner is trapped in a mine containing 3 doors.

At all times, he is equally likely to choose any one of the doors. What is the expected number of hours for this miner to reach safety?

4  The Compound Server

A web server receives requests according to a Poisson process with a rate of \(\lambda \) requests per minute. Each incoming request is independently classified as a “database query” with probability \(p\), or a “static asset request” with probability \(1-p\). Let \(N\) be the total number of requests in a minute, and \(X\) be the number of database queries in a minute.

a)
Derive the marginal distribution of \(X\). Also, write down the marginal distribution of \(Y\), defined as the number of static asset requests in a minute. (You only need to prove your answer for \(X\).)
b)
Show that \(X\) and \(Y\) are independent.
c)
Given that exactly \(k\) database queries were received in a particular minute, what is the expected total number of requests \(\Exp [N \mid X = k]\) received during that minute?

5  Lemonade Stand

Suppose I run a lemonade stand, which costs me $100 a day to operate. I sell a drink of lemonade for $20. Every person who walks by my stand either buys a drink or doesn’t (no one buys more than one). If it is raining, \(n_1\) people walk by my stand, and each buys a drink independently with probability \(p_1\). If it isn’t raining, \(n_2\) people walk by my stand, and each buys a drink independently with probability \(p_2\). It rains each day with probability \(p_3\), independently of every other day. Let \(X\) be my profit over the next week. In terms of \(n_1, n_2, p_1, p_2\) and \(p_3\), what is \(\expect {X}\)?

6  The Dice and Coin Cascade

Alice repeatedly rolls a fair six-sided die until she rolls a 6. Let \(N\) be the total number of rolls she makes (including the final roll of 6). Once Alice is finished, Bob flips a fair coin exactly \(N\) times. Let \(H\) be the total number of Heads Bob flips.

a)
Compute the exact probability that Bob flips zero Heads, \(\Pr (H = 0)\).
b)
Compute the expected value \(\Exp [H]\) and variance \(\Var (H)\).

7  Nested Uniforms

Let \(X \sim \textsf {Unif}(0, 1)\). We draw a random variable \(Y\) such that, conditioned on \(X=x\), \(Y \sim \textsf {Unif}(0, x)\). What is the expected value of \(Y\), \(\expect {Y}\)?

8  A Red Poisson

Suppose that \(x_{1},\ldots ,x_{n}\) are i.i.d. samples from a Poisson(\(\theta \)) random variable, where \(\theta \) is unknown. Find the MLE for \(\theta \).

9  Independent Shreds, You Say?

You are given 100 independent samples \(X_1=x_1, X_2=x_2, \ldots , X_{100}=x_{100}\) from Bernoulli\((\theta )\), where \(\theta \) is unknown. (Each sample is either a 0 or a 1). These 100 samples sum to 30. We saw in class that the maximum likelihood estimate for \(\theta \) is \[\hat {\theta }(x_1, \ldots , x_{100}) = \frac {1}{100}\sum _{i=1}^{100}x_i = \frac {30}{100}.\] Is \(\hat {\theta }(X_1, \ldots , X_{100}) = \frac {\sum _{i=1}^{100} X_i}{100}\) an unbiased estimator of \(\theta \)?

10  Laplace MLE

Suppose \(x_{1},\ldots ,x_{2n}\) are iid realizations from the Laplace density (double exponential density): for \(x \in \mathbb {R}\),

\[f_{X}\left ( x \mid \theta \right ) = \frac {1}{2}e^{- |x - \theta |}\]

Find the MLE for \(\theta \). For this problem, you need not verify that the MLE is indeed a maximizer. You may find the sign function useful:

\[\operatorname {sgn}{(x)} = \left \{ \begin {matrix} + 1,\ \ & x \geq 0 \\ - 1,\ \ & x < 0 \\ \end {matrix} \right .\ \]

11  Bird Watching

You are an ornithologist studying a rare species of birds in a nature reserve. Over a period of \(50\) days, you record the number of sightings of this bird (you see \(x_1, x_2, ..., x_{50}\) birds on each day). Your research has shown that the number of sightings of this species depends on the average number of monkeys in the reserve, \(\theta _1\), and the average number of eagles in the reserve, \(\theta _2\). After years of studying this rare species in other environments, you’ve found that the number of birds observed on a particular day follows the following distribution: \[p_X(k)=\frac {1}{k!}(\theta _1^k \cdot e^{-\theta _1}\cdot \theta _2^k \cdot e^{-3\theta _2})\] Find the MLE for \(\theta _1\) and \(\theta _2\) (i.e., find \(\hat {\theta _1}\) and \(\hat {\theta _2}\)).

a)
What is the likelihood function?
b)
What is the log-likelihood function?
c)
We want to find values of \(\theta _1\) and \(\theta _2\) that maximize the likelihood function. To do this, we will take the partial derivative with respect to each of these parameters and solve for the values that make them both zero. First, take the partial derivative of the likelihood function with respect to \(\theta _1\).
d)
Now, take the partial derivative with respect to \(\theta _2\).
e)
Set both these partial derivatives to 0, and solve for \(\hat {\theta _1}\) and \(\hat {\theta _2}\).

Remark: Similar to the single-variable case, we treat the points where both partial derivatives are 0 (also known as the critical point) directly as the maximum of the function. This is just a convenient simplification for 312. For general functions, we need to do the second derivative test to confirm whether a critical point is indeed a local maximum/minimum. For multi-variable functions, the second derivative test is much more complicated - it is not enough to take the partial second derivatives, you would have to check the Hessian matrix.

12  A biased estimator

In class, we showed that the maximum likelihood estimate of the variance \(\theta _2\) of a normal distribution (when both the true mean \(\mu \) and true variance \(\sigma ^2\) are unknown) is what’s called the population variance. That is \[\hat \theta _2 = \left (\frac {1}{n} \sum _{i=1}^n (x_i - \hat \theta _1)^2 \right )\] where \(\hat \theta _1 = \frac {1}{n}\sum _{i=1}^n x_i\) is the MLE of the mean. Is \(\hat \theta _2\) unbiased?

13  It Means Nothing

Suppose \(x_1, x_2, \ldots , x_n\) are samples from a normal distribution whose mean is known to be \(\mu \) but the variance is unknown. How does the maximum likelihood estimator for the variance differ from the maximum likelihood estimator when both mean and variance are unknown? Which, if any, is unbiased?

14  Maximum Likelihood Estimators

a)
Let \(x_{1},\ldots ,x_{n}\) i.i.d. samples from a random variable that follows a Rademacher distribution with unknown parameter \(p \in [0,1]\), i.e., a distribution from the family \[ \Prob {x; p} = \left \{ \begin {array}{ll} p & \textrm {if $x = +1$}\\ 1-p & \textrm {if $x = -1$} \end {array} \right . \;, \] What is the maximum likelihood estimator for \(p\)? Write this as a function of the \(x_i\)’s and \(n\).
b)
Let \(x_1,\ldots ,x_n\) be i.i.d samples that follow a \(\mathsf {Two}(\theta )\) distribution with unknown parameter \(\theta \in [0,1]\), where the probabilities from the family are given by \[\Prob {x;\theta }=\begin {cases}(1-\theta )^2&x=0\\ 2\theta (1-\theta )&x=1\\ \theta ^2&x=2 \end {cases}\] Suppose that in the sample there are \(n_0\) 0’s, \(n_1\) 1’s, and \(n_2\) 2’s. What is the maximum likelihood estimator for \(\theta \) in terms of \(n, n_0,n_1,n_2\)?
c)
Let \(x_{1},\ldots ,x_{n}\) be i.i.d. samples from a random variable that follows a so-called Borel distribution with unknown parameter \(\theta \), i.e., a distribution from the family \[ \Prob {k; \theta } = \frac {e^{-\theta k} (\theta k)^{k-1}}{k!} \;, \] where \(0 < \theta \leq 1\) is a real number, and \(k \geq 1\) is an integer. What is the maximum likelihood estimator for \(\theta \)?
d)
If the samples from the Borel distribution are 5, 7, 10, 2, 7, 5, 12, 13, 11, what is the maximum likelihood estimator for \(\theta \)? Give an exact answer as a simplified fraction.

The remaining problems cover material we may not get to this quarter.

15  Tail bounds

Suppose \(X\sim \textsf {Binomial}(6, 0.4)\). We will bound \(\Pr (X\ge 4)\) using the tail bounds we’ve learned, and compare this to the true result.

a)
Give an upper bound for this probability using Markov’s inequality. Why can we use Markov’s inequality?
b)
Give an upper bound for this probability using Chebyshev’s inequality. You may have to rearrange algebraically and it may result in a weaker bound.
c)
Give an upper bound for this probability using the Chernoff bound.
d)
Give the exact probability.

16  Exponential Tail Bounds

Let \(X \sim \mbox {Exp}(\lambda )\) and \(k > 1/\lambda \).

a)
Use Markov’s inequality to bound \(\Pr (X \geq k)\).
b)
Use Markov’s inequality to bound \(\Pr (X < k)\).
c)
Use Chebyshev’s inequality to bound \(\Pr (X \geq k)\).
d)
What is the exact formula for \(\Pr (X \geq k)\)?
e)
For \(\lambda k \geq 3\), how do the bounds given in parts (a), (c), and (d) compare?

17  Robbie’s Late!

Suppose the probability Robbie is late to teaching lecture on a given day is at most 0.01. Do not make any independence assumptions.

a)
Use a Union Bound to bound the probability that Robbie is late at least once over a 30-lecture quarter.
b)
Use a Union Bound to bound the probability that Robbie is never late over a 30-lecture quarter.
c)
Use a Union Bound to bound the probability that Robbie is late at least once over a 120-lecture quarter.