CSEP 546 - Data Mining - Autumn 2004 - Homework 3

Due Date: 7 Dec 2004 6:30pm

Turn-in procedure: Please email parag@cs.washington.edu before class on Dec 7. Any of the Word,Postscript, PDF, HTML, or Plain text formats should be fine.

Please use the subject "CSEP546: HW3 Submission", and in the text part of the message include your name and student id.

You can also submit a hardcopy of your work at the beginning of the class on Dec 7.

Notes: All homeworks are to be done individually.

  1. Let the instance space be the rectangle {(x,y): 0 < x < 40, 0 < y < 30}, and let + and - represent positive and negative training examples, respectively. Suppose you are given the following training set:

    1. Draw the Voronoi diagram corresponding to this training set, using Euclidean distance.

    2. If this Voronoi diagram represents the true concept, and examples are drawn with uniform probability from the rectangle {(x,y): 0 < x < 40, 0 < y < 30}, what is the probability of drawing a positive example?

  2. Suppose instances are of the form (x, y, z) (i.e, have three features), and you have a training set composed of positive examples at (0, 1, 0), (1, 0, 0), (0, 1, 1) and (1, 0, 1), and negative examples at (0, 0, 0), (1, 1, 0), (0, 0, 1) and (1, 1, 1).

    1. Which features would the forward selection algorithm select for a nearest-neighbor classifier, using Hamming distance and leave-one-out accuracy as the evaluation measure?

    2. Which features would the backward elimination algorithm select?

  3. A decision stump is a decision tree with only one internal node. Which algorithm has higher bias: a decision stump learner or a perceptron? And which one has higher variance? Justify your answer. (Assume that all attributes are either Boolean or numeric.)

  4. Mitchell 7.5.

  5. Han & Kamber 6.1.