# CSEP 546 - Data Mining - Spring 2003 - Homework 2

## Due Date: 14 May 2003 6:30pm

Turn-in procedure: Please email parag@cs.washington.edu before class on 14 May. Any of the Word,Postscript, PDF, HTML, or Plain text formats should be fine.

Please use the subject "CSEP546: HW2 Submission", and in the text part of the message include your name and student id.

You can also submit a hardcopy of your work at the beginning of the class on May 14.

Notes: All homeworks are to be done individually.

1. Mitchell - 10.5

2. Mitchell - 10.6

3. Consider a blood alcohol test to detect drunken drivers. The test returns positive 99% of the time if the driver is drunk, and it returns positive 1.5% of the time if the driver is sober.
1. If 99 out of 100 drivers are sober, what is the probability that a driver who tests positive is drunk?
2. Suppose that on Saturday nights only 9 out of 10 drivers are sober. What is the probability that a driver who tests positive on Saturday night is drunk?

4. Consider the following Bayesian network, in which the variables A, B, C, and D are Boolean:

1. Is C independent of A?
2. Is C independent of B?
3. Is C independent of A given B?
4. Is C independent of D?
5. Is C independent of D given B?

5. Mitchell - 4.1

6. Mitchell - 4.2

7. What is the result of applying crossover to the strings 00110 and 10111, with crossover mask 11100?

8. Which ensemble method is more likely to improve the accuracy of the naive Bayes classifier: bagging or boosting? Why?