Due Date: 16 April 2003 6:30pm (The third week
of class)
Turn-in procedure: Email your
homework file(s) to parag@cs.washington.edu
before class on 16 April. Any of the Word, Postscript, PDF, HTML, or Plain Text
formats should be fine.
Please use the subject "CSEP546: HW1 Submission", and in the text
part of the message include your name and student id.
You can also submit a hardcopy of your work at the beginning of
the class on April 16th.
Notes: All homeworks are to be done
individually. H&K refers to the Data Mining text by Han and Kamber.
- H&K - 2.3 a), b), c)
- H&K - 2.4
- H&K - 2.12 a)
- Suppose there are two continuous attributes X and Y, the
instance space is the square {(X,Y): 0 < X < 4; 0 < Y < 4},
and examples are uniformly distributed within it. Suppose
that the class is positive if X > 4 - Y, and negative otherwise.
- Draw a decision tree with seven internal nodes that approximates
this concept as closely as possible. (The only tests allowed in
nodes are of the form "X < A ?" or "Y < B ?", for any constants
A and B.)
- What is the error rate of this decision tree?
- What is the error rate of the decision tree(s) with 15 internal
nodes that best approximate(s) this concept? Why?
- If there are three equally probable classes, what is the highest
possible information gain of a Boolean attribute? Why?
- This question is about reduced-error pruning of decision trees.
Suppose the validation-set error rate of the subtree rooted at node
N is greater than the error rate obtained by converting N into a
leaf. Is it possible that, in the optimal pruned tree, N is not a
leaf? Why?
|