Please submit both code and writeup online by 6:30pm PST on Thursday, May 31, 2012. Please provide all code (sufficiently commented) so that we can run it ourselves. Submit your writeup as a PDF and limit to four pages with reasonable fonts and margins.
1.0 We will be using your ID3 learner (normal and bagged) on the promoter dataset from previous assignments. If you suspect your implementation was faulty, you are free to use an existing implementation of these algorithms (e.g. the J48 learner in the Weka Java libraries). Do not use pruning with this problem.
1.1 Modify your ID3 to accept a parameter of maximum depth, beyond which no further splitting is allowed. Consider the ID3 learner alone, speculate whether its bias increase or decrease with this parameter? What about its variance? Why? Verify your conclusion by actually measuring bias and variance for different numbers of maximum depth. At the minimum, try maximum depths of 1, 2, and 3.
1.2 How does the accuracy obtained by ID3 with bagging change as the maximum depth varies? Why? Measure bias and variance in each case. What does this say about your hypothesis?
1.3 To learn more about bias-variance decomposition, see Pedro's ICML paper. Pedro also has a reference implementation. In this homework, generate at least 10 bootstrap samples (30 is recommended if you could afford it) from the original training set and use them when you compute the bias and variance.
2.0 Consider learning a perceptron on nine-dimensional linearly separable data. How many training examples do you need to guarantee with 99% confidence that the learned perceptron has true error of at most 10%?
3.0 Mitchell 7.5a (just part a): Consider the space of instances X corresponding to all points in the x,y plane. Give the VC dimension of the following hypothesis spaces. [You also need to justify your answer. You don’t have to give a formal proof but you must present the key ideas from which the reader could construct a formal proof.]
(a) \(H_r = \) the set of all rectangles in the x,y plane. That is, \(H=\{((a<x<b)\wedge(c<y<d))|a,b,c,d \in \mathcal{R}\}\).
4.0 Download LIBSVM, currently the most widely used SVM implementation. Peruse the documentations to understand how to use it.
4.1 Download the new promoters dataset, which is the same as in Homework 3 except that it is now in LIBSVM format.
4.2 Run LIBSVM to classify promoters. Try different kernels (0-3) and use default parameters for everything else. How does it vary with different choice of kernel?
4.3 Modify your naive bayes classifier from homework 3 to apply to this dataset. How does the accuracy compare to what can be obtained using LIBSVM? Why?
4.4 How do your accuracies compare with that of the bagged decision tree from homework 3? Explain the difference (or lack of difference).
4.5 Vary the number of features used in the classifier (always choosing the highest-information-gain ones available). How does the relative accuracy of decision tree and SVM vary? How do you explain this?