Due Date: 4 June 2003 6:30pm
Turn-in procedure: Please email parag@cs.washington.edu before
class on June 4. Any of the Word, Postscript, PDF, HTML, or Plain text
formats should be fine.
Please use the subject "CSEP546: HW3 Submission", and in the text
part of the message include your name and student id.
You can also submit a hardcopy of your work at the beginning of
the class on June 4.
Notes: All homeworks are to be done
individually.
- Suppose the instance space is the square 0 < X < 8, 0 < Y < 8,
and you have a training set composed of positive examples at
(2, 2), (2, 4), (2, 6), (6, 2), (6, 4), and (6, 6), and negative
examples at (4, 2), (4, 4) and (4, 6).
- Draw the Voronoi diagram of this training set, using Euclidean
distance.
- Suppose you measure the error rate of the 3-nearest neighbor
algorithm on this training set using leave-one-out cross-validation
(i.e., you take out each example in turn, and predict its class
using the other examples). What would the measured error rate be?
- Suppose there is no noise in this domain. What can you say
about the error rate of the nearest-neighbor algorithm in this
domain, in the limit of an infinite training set?
- Suppose you apply the 3-nearest-neighbor algorithm with backward
elimination. Which features would be eliminated (X, Y, both,
or neither)? Why?
- Consider a rule induction algorithm that stops after finding
m rules. Does the bias of this algorithm increase or decrease with m?
And the variance? Why?
- Consider learning a perceptron on nine-dimensional linearly
separable data. How many training examples do you need to
guarantee with 99% confidence that the learned perceptron has
true error of at most 10%?
- What is the VC dimension of the set of intervals over the real
numbers? (i.e., each possible hypothesis is an interval of the form
a < x < b, where a, x and b are real numbers.)
- Why must any subset of a frequent itemset also be frequent?
- Han & Kamber - 6.4
|