CSE 312 – Section 7 Solutions
Spring 2026
Review of Main Concepts
-
Normal (Gaussian, “bell curve”): \(X\sim \mathcal {N}(\mu ,\ \sigma ^{2})\) iff \(X\) has the following probability density function:
\[f_{X}\left ( x \right ) = \frac {1}{\sigma \sqrt {2\pi }}\,e^{- \frac {1}{2}\frac {\left ( x - \mu \right )^{2}}{\sigma ^{2}}},\ \ x \in \bR \]
\(\expect {X}= \mu \) and \(Var(X) = \sigma ^{2}\). The “standard normal” random variable is typically denoted \(Z\) and has mean \(0\) and variance \(1\): if \(X\sim \mathcal {N}(\mu ,\ \sigma ^{2})\), then \(Z = \frac {X - \mu }{\sigma }\sim \mathcal {N}(0,1)\). The CDF has no closed form, but we denote the CDF of the standard normal as \(\Phi \left ( z \right ) = F_{Z}\left ( z \right ) = \Pr (Z \leq z)\). Note from symmetry of the probability density function about \(z = 0\) that: \(\Phi \left ( - z \right ) = 1 - \Phi (z)\).
- Standardizing: Let \(X\) be any random variable (discrete or continuous, not necessarily normal), with \(\expect {X} = \mu \) and \(Var(X) = \sigma ^{2}\). If we let \(Y = \frac {X - \mu }{\sigma }\), then \(\expect {Y} = 0\) and \(Var(Y) = 1\).
- Closure of the Normal Distribution: Let \(X\sim \mathcal {N}(\mu ,\sigma ^{2})\). Then, \(aX + b\sim \mathcal {N}(a\mu + b,a^{2}\sigma ^{2}\)). That is, linear transformations of normal random variables are still normal.
-
“Reproductive” Property of Normals: Let \(X_{1},\ldots ,X_{n}\) be independent normal random variables with \(\expect {X_i}=\mu _i\) and \(Var(X_i)=\sigma _i^2\). Let \(a_{1},\ldots ,a_{n}\mathbb {\in R}\) and \(b\mathbb {\in R}\). Then,
\[X = \sum _{i = 1}^{n}({a_{i}X}_{i} + b)\sim \mathcal {N}\left ( \sum _{i = 1}^{n}({a_{i}\mu _{i}} + b),\sum _{i = 1}^{n}{a_{i}^{2}\sigma _{i}^{2}} \right )\]
There’s nothing special about the parameters – the important result here is that the resulting random variable is still normally distributed.
- \(Z\)-score: I have not used this term in class, but a \(Z\)-score measures how many standard deviations a specific data point is above or below the mean, indicating its position relative to the average. A \(Z\)-score of 0 equals the mean, while a \(Z\)-score of 1 is one standard deviation above the mean.
-
Central Limit Theorem (CLT): Let \(X_{1},\ldots ,X_{n}\) be iid random variables with \(\expect {X_i}= \mu \) and \(\Var (X_i) = \sigma ^{2}\). Let \(X = \sum _{i = 1}^{n}X_{i}\), which has \(\expect {X} = n\mu \) and \(Var(X) = n\sigma ^{2}\). Let \(\overline {X} = \frac {1}{n}\sum _{i = 1}^{n}X_{i}\), which has \(\expect { \overline {X} } = \mu \) and \(Var( \overline {X}) = \frac {\sigma ^{2}}{n}\). \(\overline {X}\) is called the sample mean. Then, as \(n \rightarrow \infty \), \(\overline {X}\) approaches the normal distribution \(\mathcal {N}\left ( \mu ,\frac {\sigma ^{2}}{n} \right )\). Standardizing, this is equivalent to \(Y = \frac {\overline {X} - \mu }{\sigma /\sqrt {n}}\) approaching \(\mathcal {N}(0,1)\). Similarly, as \(n \rightarrow \infty \), \(X\) approaches \(\mathcal {N}(n\mu ,n\sigma ^{2})\) and \(Y' = \frac {X - n\mu }{\sigma \sqrt {n}}\) approaches \(\mathcal {N}(0,1)\).
It is no surprise that \(\overline {X}\) has mean \(\mu \) and variance \(\sigma ^{2}/n\) – this can be done with simple calculations. The importance of the CLT is that, for large \(n\), regardless of what distribution \(X_{i}\) comes from, \(\overline {X}\) is approximately normally distributed with mean \(\mu \) and variance \(\sigma ^{2}/n\). Don’t forget the continuity correction, only when \(X_{1},\ldots ,X_{n}\) are discrete random variables.
-
Continuity Correction: When we use the Central Limit Theorem (CLT) to approximate a discrete random variable \(X\) (such as a Binomial or Poisson random variable) with a continuous Normal random variable \(Y \sim \mathcal {N}(\mu , \sigma ^2)\), we encounter a structural mismatch.
A discrete random variable takes on exact integer values, meaning probabilities like \(\Pr (X = k)\) are strictly positive. However, for a continuous random variable, the probability of any single exact point is zero: \(\Pr (Y = k) = 0\).
To account for this, we use a "continuity correction". We associate the discrete probability at the integer \(k\) with the continuous probability interval from \(k - 0.5\) to \(k + 0.5\). Geometrically, think of this as matching the area of a discrete histogram bar (centered at \(k\) with width 1) to the corresponding area under the smooth continuous Normal curve.
How to apply the Correction When converting discrete bounds to continuous bounds, we expand the interval by \(0.5\) in the relevant directions to ensure the entire "histogram bar" is captured. The following assumes that the discrete random variable \(X\) takes consecutive integer values.
- Exact value: \[ \Pr (X = k) \approx \Pr (k - 0.5 \le Y \le k + 0.5) \]
- Less than or equal to: \[ \Pr (X \le k) \approx \Pr (Y \le k + 0.5) \]
- Greater than or equal to: \[ \Pr (X \ge k) \approx \Pr (Y \ge k - 0.5) \]
- Strict inequalities: First, convert strict inequalities to non-strict inequalities (since \(X\) only takes integer values), and then apply the correction. \[ \begin {aligned} \Pr (X < k) &= \Pr (X \le k - 1) \approx \Pr (Y \le (k - 1) + 0.5) = \Pr (Y \le k - 0.5) \\ \Pr (X > k) &= \Pr (X \ge k + 1) \approx \Pr (Y \ge (k + 1) - 0.5) = \Pr (Y \ge k + 0.5) \end {aligned} \]
A helpful rule of thumb: Always sketch the histogram bars and include the entire bar for any integers that satisfy the discrete inequality. If you want \(X \le k\), you must include the entire bar for \(k\), which extends up to \(k + 0.5\). If you want \(X < k\), you do not include the bar for \(k\); you only include the bar for \(k - 1\), which extends its right edge up to \(k - 0.5\).
-
General template for solving CLT problems: Sometimes we’ll be trying to solve for the probability of something (e.g., \(P(X\leq 10)\), and sometimes, we’ll be trying to find a value of some parameter that will allow for the probability to be in a certain range (e.g., \(P(X \leq 10) \leq 0.2)\). Regardless, we still will want to apply CLT on X, and follow the same process (the only difference is that we may be solving for different things).
- a)
- Setup the problem - write event you are interested in, in terms of sum
of random variables. (what do we want to solve for/what is the
probability we want to be true?)
- Write the random variable we’re interested in as a sum of i.i.d., random variables
- Apply CLT to \(X=X_1 + X_2 + ... + X_n\) (we can approximate \(X\) as a normal random variable \(Y\sim N(\mu , \sigma ^2)\))
- Write the probability we’re interested in
- b)
- If the RVs are discrete, apply continuity correction.
- c)
- Normalize RV to have mean 0 and standard deviation 1: \(Z = \frac {Y-\mu }{\sigma }\)
- d)
- Replace RV in probability expression with \(Z\sim N(0,1)\)
- e)
- Write in terms of \(\Phi (z) =P(Z\leq z)\)
- f)
- Look up in the Phi table (or do a reverse Phi table lookup if we’re looking for a value of \(z\) that gives us a certain probability)
Announcements & Plan for Section
Announcements
- Midterm today, 4/14 @ 6pm. Please bring a photo ID.
- Pset 6 is due next week, 5/20 @ 11:59pm.
Plan for Section
- Answer any questions about the midterm: content, practice tests, etc.
- Content Review (Problem 1)
- Problem 4: Bad Computer (if time remaining).
- Problem 5: Tweets or Problem 6: Ping Pong (if time remaining).
Midterm Prep Resources
- Link to information about exam.
- Link to draft cheat sheet.
- Link to practice midterm and solutions to practice midterm
1 Content Review - understanding the Central Limit Theorem
The Central Limit Theorem (CLT) is one of the most powerful results in probability because it allows us to approximate the distribution of sums and averages of independent random variables, regardless of their original distribution.
- a)
- (4 points) You roll a fair, 6-sided die 100 times. The outcome of a single roll
has a discrete uniform distribution with a mean of \(\mu = 3.5\) and a variance of \(\sigma ^2 \approx 2.92\). Let \(\bar {X}\)
be the average value of your 100 rolls. According to the Central Limit
Theorem, which of the following best describes the approximate
distribution of \(\bar {X}\)?
- (a)
- A discrete uniform distribution from 1 to 6.
- (b)
- A normal distribution with mean \(3.5\) and variance \(2.92\).
- (c)
- A normal distribution with mean \(3.5\) and variance \(0.0292\).
- (d)
- A normal distribution with mean \(350\) and variance \(292\).
Correct: (c)
The CLT states that the sample mean \(\bar {X}\) of a large number of independent, identically distributed random variables is approximately normally distributed. The expected value of the sample mean is the same as the population mean: \(\expect {\bar {X}} = \mu = 3.5\). The variance of the sample mean is the population variance divided by the sample size: \(\Var (\bar {X}) = \frac {\sigma ^2}{n} = \frac {2.92}{100} = 0.0292\).Conceptual Note: Choice (a) is the distribution of a single roll. Choice (b) forgets to divide the variance by \(n\). Choice (d) is the distribution of the sum of the rolls, not the average.
- b)
- (4 points) A shipping company loads 100 identical packages onto a truck.
The weight of a single package is a random variable with a mean of \(20\) lbs and
a standard deviation of \(5\) lbs. Let \(S\) be the total combined weight of all 100
packages. Using the CLT, what is the approximate distribution of
\(S\)?
- (a)
- \(S \approx \mathcal {N}(2000, 500)\)
- (b)
- \(S \approx \mathcal {N}(2000, 2500)\)
- (c)
- \(S \approx \mathcal {N}(20, 0.25)\)
- (d)
- \(S \approx \mathcal {N}(2000, 5)\)
Correct: (b)
We are looking at the sum \(S = X_1 + X_2 + \dots + X_{100}\). By linearity of expectation, the mean of the sum is \(\expect {S} = n\mu = 100(20) = 2000\). Because the packages are independent, the variance of the sum is the sum of the variances: \(\Var (S) = n\sigma ^2\). The standard deviation is \(\sigma = 5\), so the variance of a single package is \(\sigma ^2 = 25\). Therefore, \(\Var (S) = 100(25) = 2500\). Thus, \(S \approx \mathcal {N}(2000, 2500)\). - c)
- (4 points) The waiting time for a bus is modeled by an Exponential
distribution, which is heavily right-skewed. If you record the waiting times
for \(64\) independent bus rides and calculate your average waiting time \(\bar {X}\), what
will the shape of the distribution of \(\bar {X}\) look like?
- (a)
- It will be exactly Exponential, because the sum of exponentials is exponential.
- (b)
- It will be heavily right-skewed, matching the underlying population.
- (c)
- It will be approximately a bell-shaped Normal curve.
- (d)
- It will be perfectly uniform, since all averages balance out.
Correct: (c)
This question highlights the "magic" of the Central Limit Theorem. Regardless of the shape of the underlying population distribution (even one as skewed as the Exponential distribution), the distribution of the sample mean will converge to a symmetric, bell-shaped Normal distribution as the sample size \(n\) gets large (and \(n=64\) is generally considered large enough). - d)
- (4 points) A researcher takes a sample of size \(n = 36\) from a population with mean
\(\mu = 50\) and variance \(\sigma ^2 = 144\). They want to find the probability that the sample mean \(\bar {X}\) is
greater than \(54\). To use the Standard Normal table, they must convert \(\bar {X}\) to a
standard \(Z\)-score. Which of the following is the correct calculation for
\(Z\)?
- (a)
- \(Z = \frac {54 - 50}{144 / \sqrt {36}}\)
- (b)
- \(Z = \frac {54 - 50}{12 / 36}\)
- (c)
- \(Z = \frac {54 - 50}{12 / \sqrt {36}}\)
- (d)
- \(Z = \frac {54 - 50}{144 / 36}\)
Correct: (c)
To standardize a sample mean, the formula is \(Z = \frac {\bar {X} - \mu }{\text {SE}}\), where the standard error (SE) is the standard deviation of the sample mean, calculated as \(\frac {\sigma }{\sqrt {n}}\). We are given \(\sigma ^2 = 144\), so the standard deviation is \(\sigma = \sqrt {144} = 12\). The sample size is \(n = 36\). Therefore, the standard error is \(\frac {12}{\sqrt {36}}\). Substituting the values yields \(Z = \frac {54 - 50}{12 / \sqrt {36}}\).2 More review
- (a)
- (4 points) What does the Central Limit Theorem fundamentally
guarantee about a sequence of i.i.d. random variables as the sample
size \(n\) becomes very large?
- i.
- The distribution of the individual random variables gradually becomes normal.
- ii.
- The sample mean gets closer and closer to the true expected value of the distribution.
- iii.
- The distribution of the sample mean (or sample sum) approaches a normal distribution, regardless of the original distribution’s shape.
- iv.
- The sum of any two normally distributed random variables will also be normally distributed.
Correct: (iii)
This is the core magic of the CLT. Option (i) is a very common misconception—the original distribution never changes! Option (ii) is the definition of the Law of Large Numbers (LLN), which tells us the mean converges to a single value, whereas the CLT tells us about the shape of the distribution around that value. Option (iv) is a true mathematical property of Normal distributions, but it is not what the CLT is about.
- (b)
- (4 points) Suppose the weight of an apple in an orchard has an
unknown, highly skewed distribution with an expected value of 150
grams and a standard deviation of 20 grams. You pick a random
sample of 100 apples. Let \(S\) be the total weight of these 100 apples.
Using the CLT, what is the approximate probability that the total
weight exceeds 15,200 grams?
Hint: First, find the expected value \(\expect {S}\) and the variance \(\Var (S)\) for the sum of the 100 apples. Then standardize to find a Z-score (. Assume \(\Phi (1) \approx 0.84\).
- i.
- \(0.84\)
- ii.
- \(0.16\)
- iii.
- \(0.50\)
- iv.
- \(0.05\)
Correct: (ii)
Let \(X_i\) be the weight of a single apple. The sum is \(S = \sum _{i=1}^{100} X_i\). By linearity of expectation: \(\expect {S} = 100 \times 150 = 15000\). Since the apples are independent, the variances add: \(\Var (S) = 100 \times 20^2 = 100 \times 400 = 40000\). The standard deviation of the sum is \(\sqrt {40000} = 200\).Now, we apply the CLT to approximate \(S\) as a normal distribution and standardize: \[ \Prob {S > 15200} = \Prob {\frac {S - 15000}{200} > \frac {15200 - 15000}{200}} \approx \Prob {Z > 1} \] \[ \Prob {Z > 1} = 1 - \Phi (1) \approx 1 - 0.84 = 0.16 \]
- (c)
- (4 points) A computer server processes requests with an expected
response time of 40 milliseconds and a standard deviation of 12
milliseconds. The distribution of response times is heavily right-skewed
(an exponential-like tail). If you take a random sample of 36 requests,
what is the approximate distribution of the sample mean,
\(\bar {X}\)?
Hint: Remember that the variance of the sample mean is \(\Var ((\bar {X})) = \frac {\sigma ^2}{n}\). What is its standard deviation (the standard error)?
- i.
- Heavily right-skewed with a mean of 40 and a standard deviation of 12.
- ii.
- Approximately Normal with a mean of 40 and a standard deviation of 12.
- iii.
- Approximately Normal with a mean of 40 and a standard deviation of 2.
- iv.
- Approximately Normal with a mean of 1440 and a standard deviation of 72.
Correct: (iii)
Because the sample size (\(n=36\)) is sufficiently large, the CLT tells us that the distribution of the sample mean \(\bar {X}\) will be approximately Normal, ruling out option (i).The expected value of the sample mean is the same as the population mean: \(\expect {\bar {X}} = 40\). The variance of the sample mean is \(\Var (\bar {X}) = \frac {12^2}{36} = \frac {144}{36} = 4\). Therefore, the standard deviation of the sample mean is \(\sqrt {4} = 2\).
Option (iv) represents the distribution of the sum of the 36 requests, not the sample mean.
3 Round off error
Let \(X\) be the sum of 100 real numbers, and let \(Y\) be the same sum, but with each number rounded to the nearest integer before summing. If the roundoff errors are independent and uniformly distributed between \(-0.5\) and \(0.5\), what is the approximate probability that \(|X - Y| > 3\)?
Let \(X = \sum ^{100}_{i=1}X_i\), and \(Y = \sum ^{100}_{i=1}r(X_i)\), where \(r(X_i)\) is \(X_i\) rounded to the nearest integer. Then, we have \[X - Y = \sum ^{100}_{i = 1} X_i - r(X_i)\] Note that each \(X_i - r(X_i)\) is simply the round off error, which is distributed as \(\text {Unif}(-0.5, 0.5)\). Since \(X - Y\) is the sum of 100 i.i.d. random variables with mean \(\mu = 0\) and variance \(\sigma ^2 = \frac {1}{12}\), \(X - Y \approx W \sim \mathcal {N}(0, \frac {100}{12})\) by the Central Limit Theorem. For notational convenience let \(Z \sim \mathcal {N}(0,1)\) \[ \begin {aligned} \Pr (|X - Y| > 3) &\approx \Pr (|W| > 3) \quad &[\text {CLT}]\\ &= \Pr (W > 3) + \Pr (W < -3) \quad &[\text {No overlap between $W > 3$ and $W < -3$}]\\ &= 2 \; \Pr (W > 3) \quad &[\text {Symmetry of normal}]\\ &= 2 \; \Pr \left (\frac {W}{\sqrt {100/12}} > \frac {3}{\sqrt {100/12}}\right )\\ &\approx 2 \; \Pr (Z > 1.04) \quad &[\text {Standardize $W$}]\\ &=2 \; (1 - \Phi (1.04)) \approx 0.29834 \end {aligned} \]
4 Bad Computer
Each day, the probability your computer crashes is 10%, independent of every other day. Suppose we want to evaluate the computer’s performance over the next \(100\) days.
- a)
- Let \(X\) be the number of crash-free days in the next \(100\) days. What
distribution does \(X\) have? Identify \(\expect {X}\) and \(Var(X)\) as well. Write an exact (possibly
unsimplified) expression for \(\Pr (X\ge 87)\).
Since \(X\) counts the number of crash-free days (successes) in 100 days (trials), where each trial is a success with probability 0.9, we can see that \(X\) is binomial with \(n=100\) and \(p=0.9\), or \(X\sim \textsf {Binomial}(100,0.9)\). Hence, \(\expect {X}=np=90\) and \(\text {Var}(X)=np(1-p)=9\). Finally, \[\Pr (X\ge 87)=\sum _{k=87}^{100}{\binom {100}{k}(0.9)^k(1-0.9)^{100-k}}\]
- b)
- Approximate the probability of at least 87 crash-free days out of the
next 100 days using the Central Limit Theorem. Use continuity
correction.
Important: continuity correction says that if we are using the normal distribution to approximate \[\Pr (a \le \sum _{i=1}^n X_i \le b)\] where \(a \le b\) are integers and the \(X_i\)’s are i.i.d. discrete random variables taking integer values, then, as our approximation, we should use \[ \Pr ( a-0.5 \le Y \le b+ 0.5)\] where \(Y\) is the appropriate normal distribution that \( \sum _{i=1}^n X_i \) converges to by the Central Limit Theorem. (The intuition here is that, to avoid a mismatch between discrete distributions (whose range is a set of integers) and continuous distributions, we get a better approximation by imagining that a discrete random variable, say \(W\), is a continuous distribution with density function \[f_W(x) := p_W (i) \quad \text { when }i-0.5 \le x < i + 0.5 \text { and $i$ integer} \])
For more details see pages 209-210 in the Tsun book.
From the previous part, we know that \(\expect {X}=90\) and \(\Var (X)=9\). \[ \begin {aligned} \Pr (X \geq 87) & = \Pr (86.5 < X < 100.5) = \Pr (\frac {86.5-90}{3} < \frac {X-90}{3} < \frac {100.5-90}{3}) \\ & \approx \Pr (-1.17 < \frac {X-90}{3} < 3.5) \approx \Phi (3.5) + \Phi (1.17) - 1 \\ &\approx 0.9998+0.8790-1=0.8788 \end {aligned} \] Notice that, if you had used \(86.5 < X\) in place of \(86.5 < X < 100.5\), your answer would have been nearly the same, because \(\Phi (3.5)\) is so close to 1.
5 Tweets
A prolific twitter user tweets approximately 350 tweets per week. Let’s assume for simplicity that the tweets are independent, and each consists of a uniformly random number of characters between 10 and 140. (Note that this is a discrete uniform distribution.) Thus, the central limit theorem (CLT) implies that the number of characters tweeted by this user is approximately normal with an appropriate mean and variance. Assuming this normal approximation is correct, estimate the probability that this user tweets between 26,000 and 27,000 characters in a particular week. (This is a case where continuity correction will make virtually no difference in the answer, but you should still use it to get into the practice!).
Let \(X\) be the total number of characters tweeted by a twitter user in a week. Let \(X_i \sim \Unif (10, 140)\) be the number of characters in the \(i\)th tweet (since the start of the week). Since \(X\) is the sum of 350 i.i.d. rvs with mean \(\mu = 75\) and variance \(\sigma ^2 = 1430\), \(X \approx N \sim \mathcal {N}(350 \cdot 75, 350 \cdot 1430)\). Thus, \[ \begin {aligned} \Pr (26,000 \leq X \leq 27,000) &= \Pr (25,999.5 \leq X \leq 27,000.5) \\ &\approx \Pr (25,999.5 \leq N \leq 27,000.5) \end {aligned} \]
Standardizing this gives the following formula \[ \begin {aligned} \Pr (25,999.5 \leq N \leq 27,000.5) &= \Pr \left (\frac {25,999.5 - 350 \cdot 75}{\sqrt {350 \cdot 1430}} \leq \frac {N - 350 \cdot 75}{\sqrt {350 \cdot 1430}} \leq \frac {27000.5 - 350 \cdot 75}{\sqrt {350 \cdot 1430}} \right )\\ &\approx \Pr \left (-0.35 \leq \frac {N - 350 \cdot 75}{\sqrt {350 \cdot 1430}} \leq 1.06\right )\\ &\approx \Pr \left (-0.35 \leq Z \leq 1.06\right )\\ &= \Phi (1.06) - \Phi (-0.35)\\ &\approx 0.85543 - (1-0.63683)\\ &= 0.49226 \end {aligned} \] So the probability that this user tweets between 26,000 and 27,000 characters in a particular week is approximately 0.4923.
6 Ping Pong
You’re playing ping pong with your friend, and want to keep playing until you’ve scored 15 points. Unfortunately, your friend is a much more skilled ping pong player than you, so you only win points 25% of the time (with each point being independent of the other points). Approximate the probability that you’ll need to play at least 50 points before stopping.
Let \(X\) be the total number of points played. We want to approximate \(\Pr (X \ge 50)\).
Let \(X_i\) be the number of points played starting after the \(i - 1\)th point you win and up to and including the \(i\)th point you win, with \(X_1\) the number of points up to and including the first point you win. Then, we have \(X = \sum _{i=1}^{15}X_i\). Because you win each point independently with probability \(0.25\), we have \(X_i \sim Geo(0.25)\). Thus, \[\mathbb {E}[X_i] = 4\] and \[Var(X_i) = \frac {1-0.25}{(0.25)^2} = 12\] Since \(X\) is the sum of \(15\) i.i.d. r.v.s with mean \(\mu = 4\) and variance \(\sigma ^2 = 12\), by the central limit theorem we have \(X \approx N \sim \mathcal {N}(15 \cdot 4, 15 \cdot 12)\). Thus, \[ \begin {aligned} \Pr (X \ge 50) &= \Pr (X \ge 49.5) & \text {Continuity correction}\\ &\approx \Pr (N \ge 49.5) & \text {CLT} \end {aligned} \]
Standardizing, we get the following: \[ \begin {aligned} \Pr (N \ge 49.5) &= \Pr (\frac {N-15\cdot 4}{\sqrt {15\cdot 12}} \ge \frac {49.5 - 15\cdot 4}{\sqrt {15\cdot 12}})\\ &\approx \Pr (\frac {N-15\cdot 4}{\sqrt {15\cdot 12}} \ge -0.782)\\ &\approx \Pr (Z \ge -0.782)\\ &= 1 - \Pr (Z \le -0.782)\\ &= 1 - \Phi (-0.782)\\ &= 1 - (1 - \Phi (0.782))\\ &= \Phi (0.782)\\ &\approx 0.7823 \end {aligned} \]
7 More normal stuff (10 points)
Let \(X\) be a normal random variable with mean 12 and variance 4. Find the value of \(c\) such that \(\Prob {X < c} = 0.1\)
We are given that \(X\sim \mathcal {N}(\mu =12,\sigma ^2=4)\). Since \(X\) is a normal distribution variable, we know that the CDF for \(X\) is \(F_X(x)=\Phi (\frac {x-12}{2})\). \begin {equation*} \begin {aligned} \Prob {X < c} &= \Prob {X \le c} && \Prob {X = c} = 0 \text { when } X \text { is continuous}\\ &= \Prob {\frac {X - 12}{2} \le \frac {c - 12}{2}} && \text {Algebra}\\ &= \Prob {Z \le \frac {c - 12}{2}} && Z = \frac {X - 12}{2} \sim \mathcal {N}(0,1)\\ &=\Phi \left (\frac {c - 12}{2}\right ) && \Phi \text { function} \end {aligned} \end {equation*} And then since we want \(\Prob {X< c}=0.1\) \[\Phi \left (\frac {c-12}{2}\right )=0.1\] or equivalently \[\frac {c-12}{2}=\Phi ^{-1}(0.1)\] or equivalently \[c = 2\Phi ^{-1}(0.1) + 12\] or equivalently \[c\approx 2 \cdot (-1.28) + 12 = 9.44\]
8 CLT for stocks (10 points)
Suppose that the daily price change of a certain stock on the stock market is a random variable with mean 0 and variance \(\sigma ^2\). Thus, if \(Y_n\) is the price of the stock on the \(n\)-th day, then \[Y_n = Y_{n-1} + X_n,\quad n \ge 1\] where \(X_1, X_2, \ldots \) are independent, identically distributed random variables with mean 0 and variance \(\sigma ^2\). Suppose also that today’s stock price is 100 and \(\sigma ^2 = 16\). Use the Central Limit Theorem to estimate the probability that the stock price will exceed 110 after 10 days.
Let \(C_n\) be the total change of price after \(n\) days. Then, we have \[ C_n = \sum _{i=1}^{n} X_i \] Since \(X_i\)’s are i.i.d
random variables with mean 0 and variance \(\sigma ^2\), by the Central Limit Theorem, we
find that \(C_n \sim N(0, n\sigma ^2)\). Also, we know \(Y_n = 100 + C_n\).
Given that today’s (day 0) stock prices is 100 and \(\sigma ^2 = 16\), we find \[ \Prob {Y_{10} > 110} = \Prob {100 + C_{10} > 110} = \Prob {C_{10} > 10} = \Prob {\frac {C_{10} - 0}{4\sqrt {10}} > \frac {10 - 0}{4\sqrt {10}}} \] \[= \Prob {Z > \frac {\sqrt {10}}{4}} \approx 0.2146 \] where \(Z \sim N(0, 1)\).