CSE 546 - Machine Learning - Winter 2010

CSE 546: Machine Learning (Winter 2010)

Assignment 4: Model Ensembles, Learning Theory, and SVM

Due Date: Thursday, Mar 4, 2010 copy of report in class and submit code online (the 9th week)

Model ensembles.
- 1.1. A decision stump is a decision tree of depth one (i.e., it branches on one attribute and makes decisions). Which algorithm has higher bias: a decision stump or a perceptron? And which one has higher variance? (Assume that all attributes are either Boolean or numeric.) Which ensemble method would you use to improve the accuracy of decision stumps - bagging or boosting? Justify your answers.
- 1.2. Implement the bagging and the discrete AdaBoosting algorithm. You should make your code as modular as possible. Namely, your main module of bagging and AdaBoosting should treat the base learner as a blackbox and communicate with it via a generic interface that inputs weighted examples and outputs a classifier, which then can classify any instances. Create a wrapper for ID3 which takes a training set of weighted examples and use sampling with replacement to generate a new training set of the same size, according to the weight of each example; the wrapper then passes on this set to ID3 to obtain a classifier. Plug this wrapper and your ID3 implementation into both the bagging and the AdaBoost algorithm. Run experiments using the dataset from Homework 1 and answer the following questions. Use information gain as the evaluation criterion for all experiments. For bagging you can try 9, 17, or 33 number of samplings, and for boosting you can run either 10, 20, or 30 rounds. (The higher the better.)
  - Does boosting help ID3 more when there's pruning or when there isn't? Why? Measure bias and variance in each case. What does this say about your hypothesis? (Use 99% confidence level for pruning.)
  - From now on, do not use pruning. Modify your ID3 to accept a parameter of maximum depth, beyond which no further splitting is allowed. Consider the ID3 learner alone, speculate whether its bias increase or decrease with this parameter? What about its variance? Why? Verify your conclusion by actually measuring bias and variance for different numbers of maximum depth. At the minimum, try maximum depths of 1, 5, and 20.
  - How does the accuracy obtained by ID3 with bagging compare with that by ID3 with boosting, when the maximum depth varies? Why? Measure bias and variance in each case. What does this say about your hypothesis?
  - How does the number of boosting rounds affect your test accuracy? Try with the maximum depth being 1 and being 20. Measure bias and variance in each case. What does this say about what you observe?
  - To learn more about bias-variance decomposition, see Pedro's ICML paper. Pedro also has a reference implementation. In this homework, generate at least 10 bootstrap samples (30 is recommended if you could afford it) from the original training set and use them when you compute the bias and variance.
Learning theory.
- 2.1. Mitchell 7.3
- 2.2. Mitchell 7.4
Support vector machines.
- Download LIBSVM, currently the most widely used SVM implementation. Peruse the documentations to understand how to use it.
- Download the new promoters dataset, which is the same as in Homework 1 except that it is now in LIBSVM format.
- Run LIBSVM to classify promoters. Try different kernels (0-3) and use default parameters for everything else. How does it vary with different choice of kernel?
- Modify your perceptron classifier from homework 3 to apply to this dataset. How does the accuracy compare to what can be obtained using LIBSVM? Why?
- Vary the number of features used in the classifier (always choosing the highest-information-gain ones available). How does the relative accuracy of decision tree and SVM vary? How do you explain this?
Turn in the following:
- Your code for Problem 1. Submit your files here. The due date is 12 noon, March 4, 2010. Your code should contain appropriate comments to facilitate understanding.
- A report with your answers to the homework problems. For the programming task, in addition to answering the questions, you should describe in high-level how your code works.

CSE 546: Machine Learning (Winter 2010)

Assignment 4: Model Ensembles, Learning Theory, and SVM

Due Date: Thursday, Mar 4, 2010 copy of report in class and submit code online (the 9th week)

Good luck, and have fun!