CSE 473 Introduction to Artificial Intelligence
Autumn 2001
Due Dec. 5, 1:30pm
1. (20 points) Write a decision-tree learning algorithm using the evaluation function we discussed class (or a better one if you can come up with one). Train your algorithm on (a subset of) the data set in mushroom-data.txt (available on the course web site), and display the output function in disjunctive normal form.
Now test your tree on a different subset of the data, and answer the following questions:
· What is your accuracy on the test set?
· What is your accuracy on the training set?
· What do you do with “missing attributes” (those labeled “?”)
· Under what conditions does your algorithm stop “growing” the tree?
· How would you improve the performance of the algorithm?
Note:
the data set is very large. You will
want to pick a small subset (at most 100 instances) to train.
You will not be graded on how well your algorithm does, so you don't need to spend a lot of time “tuning it.” I do expect you to pick a reasonable stopping criterion, and a clever policy for handling missing attributes. You can discuss these issues among yourselves, but your implementations should be completely independent.
Information about the
mushroom data set (mushroom-names.txt)
The mushroom data set
(mushroom-data.txt)