HWK 3 CSE473 : Uncertainty(with solutions) (due Nov. 13) 1. R&N, Problem 14.3 SOLUTION: It's good news that the disease is rare because it makes your overall chance of having the disease extremely low even though the test was positive. T - Test is positive D - You have the disease P(D) = 0.0001 P(T|D) = 0.99 P(T) = 1, since we tested positive. P(D|T) = P(T|D)*P(D) / P(T) using Baye's rule. = (0.99 * 0.0001) / 1 = 0.000099 So the chances of having the disease are still less than 1%, even with a positive test. (Some of you came up with a different answer by using normalization): P(~D|T) = P(T|~D)*P(~D) / P(T) P(T) = P(T|D)*P(D) * P(T|~D)*P(~D) and: P(D|T) = ( P(T|D)*P(D) ) / P(T|D)*P(D) + P(T|~D)*P(~D) P(~T|~D) = 0.99 P(T|~D) = 1 - P(~T|~D) = 0.01 P(~D) = 1 - P(D) = 0.9999 P(D|T) = ( 0.99*0.0001 ) / ( 0.99*0.0001 + 0.9999 ) = 0.0098 I accepted both answers. 2. You've been given the task to construct each of the models detailed below. In each case, would you use fuzzy logic or probability in your solution? Why does one make more sense than the other? (a) An expert system that will assist a human worker in a steel factory. The worker makes decisions in the refining process based on temperature constraints, the state of the factory schedule, and the amount of impurity detected in the metal. The problem is to translate the human expert's notions about what is "too hot" or "time consuming" into a model that can help schedule the factory. (b) A program that will help schedule traffic lights based on time of day, day of the week, anticipated events, etc. You can assume that traffic data has been collected ahead of time. SOLUTION: a) Fuzzy logic makes more sense since we're dealing with human notions such as "too hot," etc. It may not be possible to break these down into absolutes, so fuzzy logic can help. Since we're not really dealing with uncertainty here, probability would not be a good fit. b) This is a clear-cut application of probability. We can build a probabilistic model based on the the traffic data (maybe a belief net...), and then use this to predict the uncertain outcome of future traffic scenarios. Fuzzy logic makes little sense here. 3. R&N, Problem 14.6 SOLUTION: Show that P(A,B|C) = P(A|C)P(B|C) is equivalent to P(A|B,C) = P(A|C) 1. Start with P(A,B|C) = P(A|C)P(B|C) 2. Rewrite the left-hand side using the conditionalized version of the generalized product rule: P(A,B|C) = P(A|B,C)P(B|C) 3. We now have P(A|B,C)P(B|C) = P(A|C)P(B|C). 4. Cancel the P(B|C) terms. 5. P(A|B,C) = P(A|C). Show that P(A,B|C) = P(A|C)P(B|C) is equivalent to P(B|A,C) = P(B|C) 1. Start with P(A|B,C) = P(A|C) 2. Rewrite the left-hand side using the conditionalized version of Bayes' rule : P(A|B,C) = P(B|A,C)P(A|C) / P(B|C) 3. We now have P(B|A,C)P(A|C) / P(B|C) = P(A|C). 4. Multipy by P(B|C) on both sides. 5. P(B|A,C)P(A|C) = P(B|C)P(A|C) 6. Cancel the P(A|C) terms. 7. P(B|A,C) = P(B|C). 4. R&N, Problem 15.1 parts a,b,c,d only. For part b, give only the table values for the "Starts," "StarterMotorWorking," and "Moves" nodes in the belief net. SOLUTION: Parts a and b: There was no "right answer" for how you added the new nodes "StarterMotorWorking" and "IcyWeather." There was also no right way to choose the probabilities. Those were freebees. Part c: If there's no conditional independence, then the joint distribution for 8 boolean nodes would be 2^8. I also accepted 2^8 - 1 because it is almost Thanksgiving. Part d: The answer to part d depends on how you drew your net. It's the total number of values in the belief net tables (1 value for a node with no parents, 2 values for a node with 1 parent, 4 values for a node with two parents...). Sum them up. 5. Consider the following belief net A B \ / v v C / \ v v E F p(A) = 0.8 p(B) = 0.2 P(C|A,B) = 0.6 P(E|C) = 0.9 P(C|A,~B) = 0.5 P(F|C) = 0.1 P(C|~A,B) = 0.4 P(C|~A,~B) = 0.1 (a) What is p(C)? (b) What is p(C|A)? (c) What is p(A|C)? (d) What is p(C|~A)? (e) What is p(~C|A)? (f) Is the set of probabilities given above complete? (Can any compound probability be calculated?) If not, what probabilities are missing? SOLUTION: a) P(C) = P(C|A,B)P(A)P(B) + P(C|A,~B)P(A)P(~B) + P(C|~A,B)P(~A)P(B) + p(C|~A,~B)P(~A)P(~B) = (0.6*0.8*0.2) + (0.5*0.8*0.8) + (0.4*0.2*0.2) + (0.1*0.2*0.8) = 0.448 b) P(C|A) = P(C|A,B)P(B) + P(C|A,~B)P(~B) = (0.6*0.2) + (0.5*0.8) = 0.52 c) P(A|C) = P(C|A)P(A) / P(C) = 0.52*0.8 / 0.448 = 0.928 d) P(C|~A) = P(C|~A,B)P(B) + P(C|~A,~B)P(~B) = (0.4*0.2) + (0.1*0.8) = 0.16 e) P(~C|A) = 1 - P(C|A) = 1 - 0.52 = 0.48 f) You still need P(E|~C), P(F|~C) in order to be able to derive them all. 6. Consider the following belief network E B / \ / v v v R A | v C The function d-sep() will return "yes" if two nodes are d-seperated, considering the given information (if there is any). d-sep() will return "no" otherwise. Determine the return values of d-sep() for each case: (a) d-sep( R, B ) {no information given} (b) d-sep( R, B | A ) (c) d-sep( R, B | A, E ) SOLUTION: The information that you needed to solve this problem is on page 445 of R&N. The graph gives you the three cases. a) Yes, by case 3. If you had a node such as A as evidence, they would be dependent. Since none of these nodes are given, R and B are independent and d-sep() would return yes. b) No, by case 3. c) Yes, by case 2. E is the parent of A, so we still say yes even though d-sep(R,B|A) is no. 7. For the following multiply connected belief net, A / \ v v B C \ / v D With the conditional probabilities: P(A) = 0.6 P(B|A) = 0.2 P(B|~A) = 0.6 P(C|A) = 0.7 P(C|~A) = 0.1 P(D|B,C) = 0.85 P(D|~B,c) = 0.8 P(D|B,~C) = 0.9 P(D|~B,~C) = 0.1 Suppose you wanted to compute P(B|A) using stochastic simulation. Compare the number of samples you would need to reach the same level of approximation with rejection sampling vs. likelihood weighting. What if your task was to compute P(D|B,C)? SOLUTION: I accepted all solutions that displayed an understanding of how likelihood weighting can get the same approximation as rejection sampling while using less samples overall. As was pointed out to me, it makes little sense to stochastically simulate the particular values in the question. Stochastic simulation is applied to belief nets that are vastly complicated, but also unfortunately inappropriate for CSE473 homework questions. Simple examples may be contrived, but they can still display the differences in the two algorithms (in this case rejection sampling would still need many more samples to compute the same approximation). A good solution to this question: P(B|A): With rejection sampling, you will have to drop the samples that do not comply with the evidence {A}. For P(B|A), you will have to reject around 40 of the 100 samples that end up with ~A. With likelihood weighting, you can keep all 100 of the samples after weighting them with the probability of generating each run with the actual belief net. P(D|B,C): The gains from using likelihood weighting are much higher here, since P(B) = P(B|A)P(A) + P(B|~A)P(~A) = (0.2 * 0.6) + (0.6 * 0.4) = 0.36 and P(C) = P(C|A)P(A) + P(C|~A)P(~A) = (0.7*0.6) + (0.1*0.4) = 0.46 P(B,C) = 0.36 * 0.46 = 0.1656 ...so for every 100 samples you'd expect to throw away 83 of them with rejection sampling. 8. R&N, Problem 16.2 SOLUTION: Expected monetary value of a lottery ticket: Most of you came up with an actual value by making an assumption about the utility of $1,000,000. They tell you not to do that, but I'm pretty sure that's for the second part of the question (when you're asked to hypothesize about when it's rational to buy a ticket at all). Otherwise, I don't know how you're supposed to derive a "monetary value" for the ticket. I accepted several answers, but the most common one was: EMV = (1/50 * 10) + (1/2,000,000 * 1,000,000) = $0.70 ...so we're assuming that U($1) is $1, and U($1,000,000) is 1,000,000. It was all right to account for the dollar lost to buy the ticket... When is it rational to buy a lottery ticket? Let's take a look at the utility equation again: 1/50 * 10 * U($1) + 1/2,000,000 * U($1,000,000) The answer obviously depends on the utility of winning the $1,000,000, since U($1) is going to be higher than 1/5 * U($1). So if we value the $1,000,000 at more than it's actual dollar worth (perhaps we borrowed money from an *unforgiving* source), it could be rational to buy the ticket. And there's R&N's answer for the last part. Poor people just have a weighted utility function that assigns more value to U($1,000,000) than we might expect otherwise... Some of you used some of the more specific utility equations from chapter 16, which was great. This minimal logic (above) was enough to get full credit, though.