CSE 473 Introduction to Artificial Intelligence

Autumn 2001

Problem Set 6

Due Dec. 5, 1:30pm

 

 

1. (20 points) Write a decision-tree learning algorithm using the evaluation function we discussed class (or a better one if you can come up with one). Train your algorithm on (a subset of) the data set in mushroom-data.txt (available on the course web site), and display the output function in disjunctive normal form.  

 

Now test your tree on a different subset of the data, and answer the following questions:

 

·        What is your accuracy on the test set?

·        What is your accuracy on the training set?

·        What do you do with “missing attributes” (those labeled “?”)

·        Under what conditions does your algorithm stop “growing” the tree?

·        How would you improve the performance of the algorithm?

 

Note: the data set is very large.  You will want to pick a small subset (at most 100 instances) to train.

 

You will not be graded on how well your algorithm does, so you don't need to spend a lot of time “tuning it.”  I do expect you to pick a reasonable stopping criterion, and a clever policy for handling missing attributes.  You can discuss these issues among yourselves, but your implementations should be completely independent.

 

Information about the mushroom data set (mushroom-names.txt)

The mushroom data set (mushroom-data.txt)

 

Earlier Assignments