HWK 3	CSE473 : Uncertainty(with solutions)	 (due Nov. 13)


1. R&N, Problem 14.3

SOLUTION:

It's good news that the disease is rare because it makes your overall chance of
having the disease extremely low even though the test was positive.

T - Test is positive
D - You have the disease

P(D) = 0.0001
P(T|D) = 0.99
P(T) = 1, since we tested positive.

P(D|T) = P(T|D)*P(D) / P(T) using Baye's rule.

= (0.99 * 0.0001) / 1

= 0.000099

So the chances of having the disease are still less than 1%, even with a positive
test.

(Some of you came up with a different answer by using normalization):

P(~D|T) = P(T|~D)*P(~D) / P(T)
P(T) = P(T|D)*P(D) * P(T|~D)*P(~D)

and:

P(D|T) = ( P(T|D)*P(D) ) / P(T|D)*P(D) + P(T|~D)*P(~D)

P(~T|~D) = 0.99
P(T|~D) = 1 - P(~T|~D) = 0.01
P(~D) = 1 - P(D) = 0.9999

P(D|T) = ( 0.99*0.0001 ) / ( 0.99*0.0001 + 0.9999 ) = 0.0098

I accepted both answers.


2.  You've been given the task to construct each of the models
    detailed below.  In each case, would you use fuzzy logic
    or probability in your solution?  Why does one make more
    sense than the other?

(a) An expert system that will assist a human worker in a 
    steel factory.  The worker makes decisions in the refining
    process based on temperature constraints, the state of the
    factory schedule, and the amount of impurity detected in
    the metal.  The problem is to translate the human expert's
    notions about what is "too hot" or "time consuming" into 
    a model that can help schedule the factory.

(b) A program that will help schedule traffic lights based on
    time of day, day of the week, anticipated events, etc.  You
    can assume that traffic data has been collected ahead of time.

SOLUTION:

a) Fuzzy logic makes more sense since we're dealing with human notions
   such as "too hot," etc.  It may not be possible to break these down
   into absolutes, so fuzzy logic can help.  Since we're not really 
   dealing with uncertainty here, probability would not be a good fit.

b) This is a clear-cut application of probability.  We can build a 
   probabilistic model based on the the traffic data (maybe a belief
   net...), and then use this to predict the uncertain outcome of 
   future traffic scenarios.  Fuzzy logic makes little sense here.

        
3. R&N, Problem 14.6

SOLUTION:

 Show that P(A,B|C) = P(A|C)P(B|C) is equivalent to P(A|B,C) = P(A|C)

 1. Start with P(A,B|C) = P(A|C)P(B|C)
 2. Rewrite the left-hand side using the conditionalized version of the
    generalized product rule: P(A,B|C) = P(A|B,C)P(B|C)
 3. We now have P(A|B,C)P(B|C) = P(A|C)P(B|C).
 4. Cancel the P(B|C) terms.
 5. P(A|B,C) = P(A|C).

 Show that P(A,B|C) = P(A|C)P(B|C) is equivalent to P(B|A,C) = P(B|C)

 1. Start with P(A|B,C) = P(A|C)
 2. Rewrite the left-hand side using the conditionalized version of Bayes' 
    rule : P(A|B,C) = P(B|A,C)P(A|C) / P(B|C)
 3. We now have P(B|A,C)P(A|C) / P(B|C) = P(A|C).
 4. Multipy by P(B|C) on both sides.
 5. P(B|A,C)P(A|C) = P(B|C)P(A|C)
 6. Cancel the P(A|C) terms.
 7. P(B|A,C) = P(B|C).


4. R&N, Problem 15.1 parts a,b,c,d only.  For part b, give only the 
   table values for the "Starts," "StarterMotorWorking," and "Moves"
   nodes in the belief net.

SOLUTION:

Parts a and b: There was no "right answer" for how you added the new nodes 
  "StarterMotorWorking" and "IcyWeather."  There was also no right way 
  to choose the probabilities.  Those were freebees.

Part c: If there's no conditional independence, then the joint distribution for
8 boolean nodes would be 2^8.  I also accepted 2^8 - 1 because it is almost 
Thanksgiving.

Part d: The answer to part d depends on how you drew your net.  It's the total
number of values in the belief net tables (1 value for a node with no parents,
2 values for a node with 1 parent, 4 values for a node with two parents...).
Sum them up.


5. Consider the following belief net

                A     B
                 \   /
                  v v
                   C
                  / \
                 v   v
                 E   F
	
p(A) = 0.8
p(B) = 0.2		
P(C|A,B) = 0.6		P(E|C) = 0.9
P(C|A,~B) = 0.5		P(F|C) = 0.1
P(C|~A,B) = 0.4		
P(C|~A,~B) = 0.1	

(a) What is p(C)?

(b) What is p(C|A)?

(c) What is p(A|C)?

(d) What is p(C|~A)?

(e) What is p(~C|A)?

(f) Is the set of probabilities given above complete? (Can any
compound probability be calculated?) If not, what probabilities are
missing?

SOLUTION:

a) P(C) = P(C|A,B)P(A)P(B) + P(C|A,~B)P(A)P(~B) + P(C|~A,B)P(~A)P(B) +
   p(C|~A,~B)P(~A)P(~B)

   = (0.6*0.8*0.2) + (0.5*0.8*0.8) + (0.4*0.2*0.2) + (0.1*0.2*0.8)
   = 0.448

b) P(C|A) = P(C|A,B)P(B) + P(C|A,~B)P(~B)
   = (0.6*0.2) + (0.5*0.8)
   = 0.52

c) P(A|C) = P(C|A)P(A) / P(C)
   = 0.52*0.8 / 0.448
   = 0.928

d) P(C|~A) = P(C|~A,B)P(B) + P(C|~A,~B)P(~B)
   = (0.4*0.2) + (0.1*0.8)
   = 0.16

e) P(~C|A) = 1 - P(C|A)
   = 1 - 0.52
   = 0.48

f) You still need P(E|~C), P(F|~C) in order to be able to derive them all.


6. Consider the following belief network

                  E     B
                 / \   /
                v   v v
                R    A
                     |
                     v
                     C

The function d-sep() will return "yes" if two nodes are d-seperated, considering
the given information (if there is any).  d-sep() will return "no" otherwise. 
Determine the return values of d-sep() for each case:

(a) d-sep( R, B )  {no information given}

(b) d-sep( R, B | A )

(c) d-sep( R, B | A, E )


SOLUTION:

The information that you needed to solve this problem is on page 445 of R&N.
The graph gives you the three cases.

a) Yes, by case 3.  If you had a node such as A as evidence, they would be
   dependent.  Since none of these nodes are given, R and B are independent
   and d-sep() would return yes.
b) No, by case 3.
c) Yes, by case 2.  E is the parent of A, so we still say yes even though 
   d-sep(R,B|A) is no.


7. For the following multiply connected belief net,

                 A
                / \
               v   v
               B   C
                \ /
                 v
                 D

   With the conditional probabilities:

P(A) = 0.6
P(B|A) = 0.2
P(B|~A) = 0.6
P(C|A) = 0.7
P(C|~A) = 0.1
P(D|B,C) = 0.85
P(D|~B,c) = 0.8
P(D|B,~C) = 0.9
P(D|~B,~C) = 0.1

Suppose you wanted to compute P(B|A) using stochastic simulation.  Compare the
number of samples you would need to reach the same level of approximation with
rejection sampling vs. likelihood weighting.  What if your task was to compute
P(D|B,C)?

SOLUTION:

I accepted all solutions that displayed an understanding of how likelihood 
weighting can get the same approximation as rejection sampling while using
less samples overall.  As was pointed out to me, it makes little sense to 
stochastically simulate the particular values in the question.  Stochastic
simulation is applied to belief nets that are vastly complicated, but also
unfortunately inappropriate for CSE473 homework questions.  Simple examples may
be contrived, but they can still display the differences in the two algorithms
(in this case rejection sampling would still need many more samples to compute
the same approximation).

A good solution to this question: 

P(B|A):

With rejection sampling, you will have to drop the samples that do not comply
with the evidence {A}.  For P(B|A), you will have to reject around 40 of the 100
samples that end up with ~A.  With likelihood weighting, you can keep all 100
of the samples after weighting them with the probability of generating each run
with the actual belief net.

P(D|B,C):

The gains from using likelihood weighting are much higher here, since 

P(B) = P(B|A)P(A) + P(B|~A)P(~A)
     = (0.2 * 0.6) + (0.6 * 0.4)
     = 0.36

and P(C) = P(C|A)P(A) + P(C|~A)P(~A)
     = (0.7*0.6) + (0.1*0.4)
     = 0.46

P(B,C) = 0.36 * 0.46
       = 0.1656

...so for every 100 samples you'd expect to throw away 83 of them with
rejection sampling.


8. R&N, Problem 16.2

SOLUTION:

Expected monetary value of a lottery ticket:

Most of you came up with an actual value by making an assumption about the 
utility of $1,000,000.  They tell you not to do that, but I'm pretty sure that's
for the second part of the question (when you're asked to hypothesize about when
it's rational to buy a ticket at all).  Otherwise, I don't know how you're 
supposed to derive a "monetary value" for the ticket.  I accepted several answers,
but the most common one was:

EMV = (1/50 * 10) + (1/2,000,000 * 1,000,000)
    = $0.70

...so we're assuming that U($1) is $1, and U($1,000,000) is 1,000,000.
It was all right to account for the dollar lost to buy the ticket...

When is it rational to buy a lottery ticket?

Let's take a look at the utility equation again:
1/50 * 10 * U($1) + 1/2,000,000 * U($1,000,000)

The answer obviously depends on the utility of winning the $1,000,000, since
U($1) is going to be higher than 1/5 * U($1).  So if we value the $1,000,000 at
more than it's actual dollar worth (perhaps we borrowed money from an *unforgiving*
source), it could be rational to buy the ticket.

And there's R&N's answer for the last part.  Poor people just have a weighted 
utility function that assigns more value to U($1,000,000) than we might expect
otherwise...  

Some of you used some of the more specific utility equations from chapter 16,
which was great.  This minimal logic (above) was enough to get full credit, 
though.