\documentclass[letterpaper,11pt]{article}
\newcommand{\removefromfile}[1]{}
\usepackage{algorithm}
\usepackage[noend]{algpseudocode}
\usepackage{amsmath}
\usepackage{enumitem}
\usepackage{hyperref}
\usepackage{graphicx}
\usepackage[legalpaper, portrait, margin=1in]{geometry}
\begin{document}
\title{{\bf Problem Set 6 Solutions} }
\author{insert your name here}
\date{}
\maketitle
\section*{Problem 1 (Confidence intervals in "real life") Solution}
\begin{enumerate}[label=(\alph*)]
\item You're trying to figure out your finances over the upcoming year. Based on your recent spending patterns, you know that you spend \$1200 a month on average, with a standard deviation of \$400, and each month's spending is independent and identically distributed. In addition, because these are hard times, you don't have any income. How much money should you have in your bank account right now if you don't want to go broke in the next 12 months, with probability at least 95\%? You should treat the amount of money you'll spend as continuous.\\
\textbf{Solution:}
\item You've decided to try starting a company together with your 312 pset partner. After hours of brainstorming (in between solving 312 problems), you've cut your list of ideas down to 10, all of which you want to implement at the same time. An angel investor has agreed to back all 10 ideas, as long as your net return from implementing the ideas is positive with at least 95\% probability.
Suppose that implementing an idea requires 50 thousand dollars, and your start-up then succeeds with probability $p$, generating 150 thousand dollars in revenue (for a net gain of 100 thousand dollars), or fails with probability $1-p$ (for a net loss of 50 thousand dollars). The success of each idea is independent of every other. What is the condition on $p$ that you need to satisfy in order to secure the angel investor funding? You should treat the amount of revenue you'll get as discrete (since you're only ever adding up multiples of \$100,000 and -\$50,000).\\
\textbf{Solution:}
\item One of your start-ups uses error-correcting codes, which can recover the original message as long as at least 1000 packets are received (not erased). Each packet gets erased independently with probability 0.8. How many packets should you send such that you can recover the message with probability at least 99\% You should treat the number of packets received as discrete.\\
\textbf{Solution:}
\end{enumerate}
\newpage
\section*{Problem 2 (Polling again) Solution}
In class on 11/4, we discussed an {\textbf idealized} polling procedure and analysis to determine the fraction $p$ of a population that is planning to vote to legalize the therapeutic use of magic mushrooms.
Specifically, we analyzed how we could choose the number $n$ of people to sample in order to guarantee that 98\% of the time, it will be the case that $$p \in [\overline{X}- 0.05, \overline{X} + 0.05].$$
%Follow the same outline used in class to determine the value of $n$ needed to guarantee that $[\overline{X}- 0.03, \overline{X} + 0.03]$ is a a 92\% confidence interval.
You probably know that many recent polls (that follow similar methodology) have been \textbf {way off}. Briefly discuss two reasons that real-world polling may not work as well as the idealized polling that we discussed.
Note: This is not a math question, and there can be many different answers.\\
\newline
\textbf{Solution:}
\newpage
\section*{Problem 3 (Exponential in all directions) Solution}
A continuous random variable $X$ has a density function with parameter $\lambda$ given by:
$$f_X(x) = c e^{-\lambda |x|}\quad\quad -\infty < x < \infty,$$
for some constant $c$.
For parts b-f of this problem, assume $\lambda > 0$.
\begin{enumerate}[label=(\alph*)]
\item If $\lambda$ is equal to 0 or negative, this is not a valid density function. Explain what property of pdfs is violated when $\lambda \le 0$.\\
\textbf{Solution:}
\item Compute the constant $c$ in terms of $\lambda$. Graph the pdf for $\lambda = 1$ and $\lambda = 5$. You can use \href{https://www.wolframalpha.com/examples/mathematics/plotting-and-graphics/}{Wolfram Alpha} or any other graphing tool.\\
\textbf{Solution:}
\item Compute the mean and variance of $X$ in terms of $\lambda$.\\
\textbf{Solution:}
\item Compute $Pr(X \ge x)$ in terms of $x$ and $\lambda$. (Note that $x$ can be positive or negative or 0. Consider all cases.)\\
\textbf{Solution:}
\item For $s,t > 0$, compute $Pr(X \ge s+t | X \ge s)$.\\
\textbf{Solution:}
\item Let $Y= |X|.$ Compute the pdf of $Y$.\\
\textbf{Solution:}
\end{enumerate}
\newpage
\section*{Problem 4 (Analyze code snippet) Solution}
Let {\tt``data[ ]''} be an array of $100$ random variables, where the $i$th entry in {\tt data}, i.e. {\tt data}$[i]$, for $1 \le i \le 100$ is a sample from an independent exponential distribution with parameter $\lambda =5$.
The following is a program to compute the minimum of these numbers.
\begin{verbatim}
Compute-Min:
min := infinity
for t := 1 to 100
if data[t] < min
min := data[t]
print "The new minimum is " min (*)
\end{verbatim}
\begin{enumerate}[label=(\alph*)]
\item What is the worst case number of times line (*) is executed?\\
\textbf{Solution:}
\item What is the expected number of times line (*) is executed?\\
\textbf{Solution:}
\end{enumerate}
\newpage
\section*{Problem 5 (Distinct Elements Analysis) Solution}
YouTube wants to count the number of \textbf{\textit{distinct}} views for a video, but doesn't want to store all the user ID's. In class on 11/9 and 11/13, we described a way for them to get a good estimate of this number without storing everything.
We modelled the problem as follows: we see a \textbf{stream} of 8-byte integers (user ID's), $x_1,x_2,\dots,x_N$, where $x_i$ is the user ID of the $i$-th view to a video, but there are only $n$ \textit{distinct} elements ($1\le n\le N$), since some people rewatch the video, even multiple times. We don't know what the number of views $N$ is; we can't even store the number $n$ of distinct views (i.e., the number of distinct views).
\begin{enumerate}[label=(\alph*)]
\item Let $U_1,\dots,U_m$ be $m$ iid samples from the continuous $Unif(0,1)$ distribution, and let $X=\min\{U_1,\dots,U_m\}$. We know from lecture (and the textbook) that $E[X]=\frac{1}{m+1}$. Compute $Var[X]$.\\
\textbf{Solution:}
\item If we were to solve the Distinct Elements problem naively using a \code{Set}, what would the big-Oh space complexity be (in terms of $N$ and/or $n$)? If a video had $N=2$ billion views, with only $n=900$ million of them being distinct views, how much storage would we need for this one video to keep track of the distinct users? Your answer should be in bytes.\\
\textbf{Solution:}
\item Determine how much space is saved from using the MultDistElts class if YouTube wants to use $K=10,000$ hash functions compared to using the Set in part (b)? Assume for simplicity MultDistElts only stores $K = 10,000$ floats $\text{val}_i$ (each float is 8 bytes, and corresponds to a certain DistElts class). What is the multiplicative savings factor (e.g., 10x, 2x, etc)?\\
\textbf{Solution:}
\end{enumerate}
\newpage
\section*{Problem 7 (Extra Credit: Incommunicado) Solution [OPTIONAL]}
$n$ people are brought into a room. A single bit (either 0 or 1) gets pasted
on each person's forehead. They cannot see the bit on their own forehead, but
they can see the bits on everyone else's foreheads.
Without communicating and without seeing what anyone else writes, they must each write down a bit (either 0 or 1) which should
be equal to the number on their own forehead. If they all get it right, they will each earn a prize of one million dollars, but if any of them get it wrong, there will be no prizes at all.
They can agree ahead of time on a protocol, that is before the numbers are placed
on their foreheads. But once they are brought into the room, no further communication
is allowed. Come up with a protocol that guarantees they will each win a million dollars with probability 1/2.\\
\newline
\textbf{Solution:}
\newpage
\section*{Problem 8 (Extra Credit: Beyond distinct elements) Solution [OPTIONAL]}
Suppose we have a stream $x_1,\dots,x_N$, with only $n