Due Date: 19 October 2003 6:30pm (The third week
Turn-in procedure: Email your
homework file(s) to email@example.com
before class on 19 October. Any of the Word, Postscript, PDF, HTML, or Plain Text
formats should be fine.
Please use the subject "CSEP546: HW1 Submission", and in the text
part of the message include your name and student id.
You can also submit a hardcopy of your work at the beginning of
the class on October 19th.
Notes: All homeworks are to be done
individually. H&K refers to the Data Mining text by Han and Kamber.
- H&K - 2.4
- H&K - 2.6
Note: By "Design a data warehouse" we mean: "Select a schema (e.g,
star, snowflake) and design the fact table and dimension tables (i.e.,
choose the dimensions and measures to include, etc.). Justify your
choices, discussing the relevant issues and alternatives."
- H&K - 2.7 b)
- Suppose that a learning algorithm is trying to find a
consistent hypothesis when the (Boolean) classes of examples are
actually completely random (i.e., the class of an example is 0 or 1
with equal probability). If there are n Boolean attributes and the
training set is composed of m examples drawn uniformly without
replacement from the set of 2^n possible examples, what is the
probability of finding a contradiction (i.e., the same example
with different classes) in the data?
Note: By "without replacement" we mean that the
same example with the same class cannot appear more than once in the
data, but it could appear once with class 0 and once with class 1.
- H&K - 7.6 a) b)
- Consider the following decision tree, where W, X and Y are the attributes
tested at the nodes (all Boolean), A, B, C, D are the classes predicted at the
leaves, and the left and right branches of each node correspond respectively
to the values True and False. Each leaf is labeled with its accuracy on the
validation set, each internal node is labeled with the accuracy on the validation
set it would have if was a leaf, and each branch is labeled with the fraction
of examples from the parent node that follow that branch. Which subtrees would