\documentclass[letterpaper,11pt]{article}
\usepackage{algorithm}
\usepackage[noend]{algpseudocode}
\usepackage{amsmath}
\usepackage{enumitem}
\usepackage{hyperref}
\usepackage[most]{tcolorbox}
\usepackage[margin=1in]{geometry}
\renewcommand{\theenumi}{\alph{enumi}}
\renewcommand{\labelenumi}{\theenumi)}
\renewcommand{\theenumii}{\roman{enumii}}
\renewcommand{\labelenumii}{(\theenumii)}
\begin{document}
\title{{\bf Problem Set 3 Solutions} }
\author{Name: TODO}
\date{}
\maketitle
\section*{Collaborators}
TODO: List collaborators, if any.
\newpage
\section*{Problem 1 (You knew it all along!) Solution}
You are taking a multiple choice test that has 5 answer choices for each question. In answering a question on this test, the probability you know the correct answer is $0.7$. If you don't know the correct answer, you choose one (uniformly) at random. Expressed as a percentage with 2 decimal places, what is the probability that you knew the correct answer to a question, given that you answered it correctly?
Use Bayes Theorem, and give names to the relevant events, e.g. let $K$ be the event that you know the correct answer and let $C$ be the event that you answer the question correctly (whether you knew the answer or not).
\begin{tcolorbox}
TODO: Your solution here.
\end{tcolorbox}
\newpage
\section*{Problem 2 (The mysteries of independence) Solution}
Suppose that a uniformly random card is selected from a standard 52-card deck of cards. Let $E$ be the event that the card is a king, let $F$ be the event that the card is a heart, and let $G$ be the event that the card is black (that is, a spade or a club).
\begin{enumerate}
\item
\begin{enumerate}
\item Are $E$ and $F$ independent? Provide a short proof of your claim.
\begin{tcolorbox}
TODO: Your solution here.
\end{tcolorbox}
\item Are $G$ and $F$ independent? Provide a short proof of your claim.
\begin{tcolorbox}
TODO: Your solution here.
\end{tcolorbox}
\item Are $E$ and $G$ independent? Provide a short proof of your claim.
\begin{tcolorbox}
TODO: Your solution here.
\end{tcolorbox}
\end{enumerate}
\item Now assume that an additional green card (with no suit and no rank) is added to the deck and a uniformly random card is selected from this enlarged deck of 53 cards. Let $E'$ be the event that the card is a king, let $F'$ be the event that the card is a heart, and let $G'
$ be the event that the card is black (that is a spade or a club).
\begin{enumerate}
\item Are $E'$ and $F'$ independent? Provide a short proof of your claim.
\begin{tcolorbox}
TODO: Your solution here.
\end{tcolorbox}
\item Are $G'$ and $F'$ independent? Provide a short proof of your claim.
\begin{tcolorbox}
TODO: Your solution here.
\end{tcolorbox}
\item Are $E'$ and $G'$ independent? Provide a short proof of your claim.
\begin{tcolorbox}
TODO: Your solution here.
\end{tcolorbox}
\end{enumerate}
\end{enumerate}
A proof that two events $A$ and $B$ are independent typically consists of showing that $Pr(A \cap B) = Pr (A ) \cdot Pr (B)$, whereas a proof that they are not independent consists of showing that $Pr(A \cap B) \ne Pr (A ) \cdot Pr (B)$
\newpage
\section*{Problem 3 (Miscounting) Solution}
Consider the question: what is the probability of getting a \textbf{7-card} poker hand (order doesn't matter) that contains at least two 3-of-a-kind (3-of-a-kind means three cards of the same rank). For example, this would be a valid hand: ace of hearts, ace of diamonds, ace of spaces, 7 of clubs, 7 of spades, 7 of hearts and queen of clubs. (Note that a hand consisting of all 4 aces and three of the 7s is also valid.)
Here is how we might compute this:
\begin{quote} Each of the $52 \choose 7 $ hands is equally likely.
Let $E$ be the event that the hand selected contains at least two 3-of-a-kinds.
Then $$\Pr(E) = \frac{|E|}{{52 \choose 7}}$$
To compute $|E|$, apply the product rule. First pick two ranks that have a 3-of-a-kind (e.g. ace and 7 in the example above). For the lower rank of these, pick the suits of the three cards. Then for the higher rank of these, pick the suits of the three cards. Then out of the remaining $52-6 = 46$ cards, pick one. Therefore $$|E|= {13 \choose 2} \cdot {4 \choose 3}\cdot {4 \choose 3} \cdot {46 \choose 1} \quad \quad \text{ and hence }
\quad\quad \Pr(E) = \frac{{13 \choose 2}\cdot 4^2 \cdot 46}{{52 \choose 7}}.$$
\end{quote}
Explain what is wrong with this solution. If there is over-counting in $|E|$, characterize all hands that are counted more than once, and how many times each such hand is counted. If there is under-counting in $|E|$, explain which hands are not counted.
Also, give the correct answer for $\Pr(E)$.
\begin{tcolorbox}
TODO: Your solution here.
\end{tcolorbox}
\newpage
\section*{Problem 4 (Balls) Solution}
Consider an urn containing 12 balls, of which 7 are white and the rest are black. A sample of size 5 is to be drawn with
replacement. What is the conditional probability that the first and fourth balls drawn will be white given that the sample drawn contains exactly 3 white balls?
\smallskip
Note that drawing balls {\em with replacement} means that after a ball is drawn (uniformly at random from the balls in the bin) it is put back into the urn before the next independent draw.
\smallskip
Please use the following notation in your answer:
Let $W_i$ be the event that the $i^{th}$ ball drawn is white. Let $B_i$ be the event that that the $i^{th}$ ball drawn is black, and let $F$ be the event that exactly 3 white balls are drawn.
\begin{tcolorbox}
TODO: Your solution here.
\end{tcolorbox}
\newpage
\section*{Problem 5 (The path less traveled \ldots) Solution}
Alice and Bob go on a hike when they suddenly come upon a place where the trail diverges into two paths. One of these two paths is less traveled, but Alice and Bob can't tell which. Alice and Bob each select one of the paths independently and randomly. Alice selects the one more often traveled with probability $p_A$ and Bob selects the one less
often traveled with probability $p_B$. Alice and Bob decide ahead of time that if their random selection agrees, they will take the selected path (the one they agree on), and if their random selection disagrees, they will pick one of the paths at equal probability and take that one. What is the probability that they end up taking the path less traveled?
\smallskip
{\textbf Hint:} Use the law of total probability, partitioning based on whether Alice and Bob select the same path or different paths.
\begin{tcolorbox}
TODO: Your solution here.
\end{tcolorbox}
\newpage
\section*{Problem 6 (Aces) Solution}
Suppose that an ordinary deck of 52 cards (which contains 4 aces) is randomly divided into 4 hands of 13 cards each. We are interested in determining $p$, the probability that each hand has an ace. Let $E_i$ be the event that the $i$-th hand has exactly one ace. Determine $$p = \Pr
(E_1\cap E_2 \cap E_3 \cap E_4)$$ using the chain rule.
\begin{tcolorbox}
TODO: Your solution here.
\end{tcolorbox}
\newpage
\section*{Problem 7 (Are you game?) Solution}
The Octopus Game Show has its contestants compete in various dangerous and/or embarrassing tasks. Based on how whether they succeed in a week's task they are randomly chosen to go on to the next week. The Octopus Game Show randomly chooses some of those to continue on to the next week with different probabilities based on whether or not they succeeded in the immediately prior week. (Because the organizers think it is fun to watch people fail, success does not guarantee moving on, and failure doesn't guarantee being eliminated from the game.)
\smallskip
Suppose that the Octopus Game Show sets the probabilities for these selections each week based on the fraction of successful contestants in the prior week so that a random contestant will be advanced with probability 50\%.
Suppose that you also know that the probabilities of selection are such that?
\begin{itemize}
\item the probability that a uniformly random contestant is successful in the first week given that they are selected to advance to the second week is 0.95. %95\% of those who make it to the second week were successful in the first week.
\item the probability that a uniformly random contestant is successful in the first week given that they are \emph{not} selected to advance to the second week is 0.75.
%75\% of those who \emph{do not} make it to the second week were successful in the first week.
\end{itemize}
If we randomly choose a contestant uniformly from among those who started the game:
\begin{enumerate}
\item What is the probability that this contestant was successful in the first week?
Use the following notation in your solution: Let $S$ denote the event that the contestant is successful in the first week. Let
$C$ denote the event that the contestant was chosen to advance to the second week.
\begin{tcolorbox}
TODO: Your solution here.
\end{tcolorbox}
\item Expressed as a percentage with 2 decimal places, what is the probability that the Octopus Game Show selected the contestant for the second week conditioned on their being successful in the first week?
\begin{tcolorbox}
TODO: Your solution here.
\end{tcolorbox}
\item Expressed as a percentage with 2 decimal places, what is the probability that the contestant was not selected for the second week conditioned on their being unsuccessful in their first week?
\begin{tcolorbox}
TODO: Your solution here.
\end{tcolorbox}
\end{enumerate}
\newpage
\section*{Problem 8 (Naive Bayes \textbf{$[$Coding$]$}) Solution}
Use the Naive Bayes Classifier to implement a spam filter that learns
word spam probabilities from our pre-labeled training data and then
predicts the label (ham or spam) of a set of emails that it hasn't
seen before. See the slides from Section 3 for details on implementation and also Section 9.3 from \href{www.alextsun.com/files/Prob_Stat_for_CS_Book.pdf#Item.531}{the book}. To solve the task, we have set up an \href{https://edstem.org/us/courses/32027/lessons/51248/slides/286993}{edstem lesson}. In particular, write your code to implement the functions {\tt fit} and {\tt predict} in the provided file, \href{https://edstem.org/us/courses/32027/lessons/51248/slides/286993}{\tt{nb.py}}.
\smallskip
You will be able to run your code directly within edstem, and to test it, using the ``Mark'' option. This, however, will not evaluate your solution. Instead, once you're ready to submit, you can right-click the files in the directory to download them. Please upload your completed {\tt nb.py } to Gradescope under ``PSet3 [Coding]''.
\medskip
{\textbf Some notes and advice:}
\begin{itemize}
\item Read about how to avoid floating point underflow using the log-trick as described in these \href{https://courses.cs.washington.edu/courses/cse312/18sp/lectures/naive-bayes/naivebayesnotes.pdf}{notes}.
\item Make sure you understand how Laplace smoothing works.
\item Remember to remove any debug statements that you are
printing to the output.
\item \textbf{Do not directly manipulate file paths or use
hardcoded file paths.} A file path you have hardcoded into
your program that works on your
computer won't work on the computer we use to test your
program.
\item Needless to say, you should practice what you've learned
in other courses:
document your program, use good variable names, keep your code
clean and straightforward,
etc. Include comments outlining what your program does and
how. We will not spend time
trying to decipher obscure, contorted code. Your score on Gradescope is your final score, as you have unlimited attempts. \textbf{START EARLY}.
\item We will evaluate your code on data you don't have access to, in addition to the data you are given.
\item Remember, it is not expected that Naive Bayes will classify every single
test email correctly, but it should certainly do better than random chance! As this algorithm is deterministic, you should get a certain specific test accuracy around 90-95\%, which we will be testing for to ensure your algorithm is correct. Note that we will run your code on a test dataset you haven't seen, but you will know immediately if you got full score.
\end{itemize}
\begin{tcolorbox}
Nothing to write for this question.
\end{tcolorbox}
\end{document}