Assignment  #8 Machine Learning - Decision Trees

Due: December 6 (Two weeks!)

Readings for this homework:

1. Complete the "Additional Exercises" for 4 solvers other than C4.5 for the "Classification with WEKA" exercise begun in class on Nov 11.  Write a short (1 paragraph) description of how each solver works, using information you find in the help files, the textbook, or on the web.  Create a table comparing the solvers' performance (including C4.5).

2. Suppose you want to use decision tree learning to learn an unknown Boolean function over k-digit binary numbers.  You first task is to decide what the features are that can appear in the nodes of the tree.  You consider two alternatives:

(a) How many possible features are there for choice 1?

(b) How many possible features are there for choice 2?

(c) Give an example of a simple target function which to learn exactly would require an exponentially larger tree using feature set 1 than using feature set 2, or provide an argument why none could exist.  (By "exponential" we mean that if the tree using feature set 2 has m nodes, then the smallest equivalent tree using feature set 1 has (at least) cm nodes, for some constant c.)

(d) Give an example of a simple target function which to learn exactly would require an exponentially larger tree using feature set 2 than using feature set 1, or provide an argument why none could exist.

*** SPOILERS *** (read only if you get stuck...)

Readings for next week: