Due Date: 11 April 2001 6:30pm (The third week of class)
Please email firstname.lastname@example.org before
class on 11 April. I would highly prefer not to get Word documents. Postscript,
PDF, HTML, or plain text would be much easier for me to read (on my Linux machine). Please email me if
you don't know how to generate any of these formats from Word. Note that zipping (with either WinZip/PKZip
or gzip) Postscript files is a really good idea.
Please use the subject "CSE592: HW1 Submission", and in the text part of the message include
your name and student id.
All homeworks are to be done individually. H&K refers to the Data Mining text by
Han and Kamber. Mitchell refers to the Machine Learning text by Mitchell. The questions
from Mitchell are included below due to the problem some people have had getting the book.
- H&K - 2.4
- H&K - 2.6 Note: By "Design a data warehouse" we mean: "Select a schema (e.g, star,
snowflake) and design the fact table and dimension tables (i.e., choose
the dimensions and measures to include, etc.). Justify your choices,
discussing the relevant issues and alternatives."
- H&K - 2.7(b)
- Mitchell - 3.1 : Give decision trees to represent the following boolean functions:
- (a) A AND !B
- (b) A OR [B AND C]
- (c) A XOR B
- (d) [A AND B] OR [C AND D]
- Mitchell - 3.2 : Consider the following set of training examples
| 1 || + || T || T
| 2 || + || T || T
| 3 || - || T || F
| 4 || + || F || F
| 5 || - || F || T
| 6 || - || F || T
- (a) What is the entropy of this collection of training examples with respect to
the target function classification?
- (b) What is the information gain of A2 relative to these training examples?
- Why is the validation set used for pruning a decision tree usually smaller
than the one used to grow the tree?