Homework #3

News of your success at Scotland yard has spread rapidly. Now a team of cancer specialists would like to enlist your aid in deciding whether to operate on a number of their patients. All of their patients have tumors but the specialists are unsure whether the tumors are malignant or benign. If a tumor is malignant, the team must operate but if it is benign there is no need for a risky operation. The specialists have kept detailed records of past cases of patients with tumors and whether these tumors turned out to be malignant. They want you to use this past information to make predictions about the possible malignancy of the tumors of the new patients. Much is riding on these predictions since your great Aunt Myrtle is one of the new patients.

The data the specialists have collected is summarized in the table below. There are three attributes of each tumor that the cancer specialists can determine without an operation: 1) is the tumor large? 2) is the tumor hard? 3) is the tumor symmetrical? These attributes are abbreviated L, H and S respectively and their truth value is given in the table. For the collected data, the specialists also know whether the tumor turned out to be malignant or not. This fact is in the last column labeled with an M.

Data For Past Patients (Training data)

      L  H  S  M
      ----------
1)    T  F  F  T  
2)    F  F  F  T
3)    F  T  T  T
4)    F  F  T  F
5)    T  F  T  F


Data for the New Patients (Test data)

      L  H  S  M
      ----------
6)    F  F  F  ?   
7)    T  T  T  ?
8)    F  F  T  ?

Problem #1:

Your first attempt at this problem is to grow a decision tree. Your splitting criteria is Information Gain. The stopping criteria is zero entropy(split the node unless there is zero entropy).

a) Give the decision tree which is grown on the training data. Please represent this tree in text in the following way: Call the root of the tree Root. For all other nodes, give a string of L's and R's specifying how to reach that node by traversing left or right from the root. For nodes where a question about an attribute is asked, give the abbreviation for that attribute. Assume that if the attribute is true, we go to the left child and if false to the right child. Also give the information gain to two decimal points of asking that question. For the leaf nodes, give the predicted classification.

For example, here's a sample text representation of a tree with 5 nodes. The root's left child is a leaf. The right child is a parent node which has two children which are leaf nodes:

      Root:   H?   .52
      L:      M=True
      R:      L?   .63
      RL:     M=False     
      RR:     M=True
      
b) How does your decision tree classify the tumors of the new patients(examples 6-8)?

Problem #2

Much to your chagrin, Aunt Myrtle decides to sit in on the cse592 PAC learning lecture and has some ideas about a reformulation of this learning problem. She assumes that the past and current patients are both drawn from the same probability distribution. She also gets the cancer specialists to come up with a set of 10,000 likely hypotheses. The specialists are certain that one hypothesis in this set will predict the malignancy of a tumor perfectly but don't know which one it is. Myrtle is a gambling woman and she would be happy with an algorithm that will find with probability .001 a hypothesis which has probability of error .01. Aunt Myrtle's algorithm is simply to choose randomly among the specialist's hypotheses which are completely consistent with the training data. (Note: the data for this problem contains more attributes than the data in problem #1 so the 10,000 hypothesis are all unique).

Please answer the following questions about this technique:

a) How many training examples (data on past patients) must be found in order to guarantee Aunt Myrtle's bounds. Show your work.

b) Can you give a tight upper bound on the probability that Myrtle's algorithm will misclassify a new tumor chosen according to the underlying probability distribution given that the learning algorithm satisfies Myrtle's specified epsilon and delta requirements? If so, give the bound. If not explain.

c) Assume the cancer specialists are wrong and their set of hypotheses does not contain a perfect hypothesis. If Myrtle's algorithm finds a hypothesis consistent with the amount of training data given in answer a), will the required epsilon and delta bounds still hold? Explain.

Problem #3

Aunt Myrtle is also interested in VC dimensions. Please humor a woman in dire straits and answer the following questions for her.

a) Aunt Myrtle believes that the VC dimension of the hypotheses set in problem #2 is <= log 10,000 (this is log base 2). Can you show this by proving that for a set of binary classifying hypotheses H, VC(H) <= log |H|? If so, do it. If not, give a counter example.

b) Aunt Myrtle is also curious about the VC dimension of the hypothesis set in Problem #1 where each instance has only 3 attributes and the set of hypotheses is all possible decision trees which test these 3 attributes. Give the VC dimension of this set of all decision trees which predict malignancy given the three boolean attributes.


Jared Saia