CSE 473 - Introduction to Artificial Intelligence - Autumn 2003

Prof. Henry Kautz

Assignment 9 - Neural Networks and Decision Trees

Due at the start of class on Wednesday, December 10th.

Problem 1: Expressivity of Neural Networks (5 points)

Suppose you have a 3-layer neural network with K input units, H hidden layer units, and K output units.  The units use a simple threshhold activation function, so the output of each unit is a 0 or a 1.  You further know that at most one input is set to 1 at any time.  Provide a (good) lower bound on the number of hidden units (H) that would be required for the network to be able to represent the identity function (input 1 maps to output 1, input 2 maps to output 2, etc.).  You do not need to actually specify the network to solve this problem: think about many hidden units are needed to encode all the possible inputs.

Problem 2: Learning in Neural Networks (5 points)

Imagine a new type of neuron where the output is defined by the following formula:

 

What is the weight update rule for each weight Wj for the single unit case?  Use the general rule for a sum of squared errors function (equation 20.2) derived on pages 741-742 of the textbook, but note the following typo in the book:  the third line from the bottom on page 741 should be:

 

It will be useful to recall the chain rule:

 

Finally, to avoid confusion: at the bottom of page 243 a formula for a gradient is given that is different from the gradient derived earlier.  This is because it uses a different error function, the log likelihood error defined just above it on the page.

Problem 3:  Learning to Recommend CD's Using Decision Trees (10 points)

Consider the problem of recommending music CD's to a person based on their history of purchases.  The system should recommend CD's that the person is most likely to buy.  Assume you are using the following attributes to describe the examples:

     TYPE         possible values:     Rap, Jazz, HeavyMetal
     PRICE        possible values:     Cheap, Expensive

(Since each attribute's value starts with a different letter, for shorthand we'll just use that initial letter, e.g., 'J' for Jazz.)

The output decision is binary-valued, so we'll use '+' and '-' as our concept labels, indicating a "buy" recommendation or not, respectively.

Here is our TRAINING set:

     TYPE = R    PRICE = C    CATEGORY = +
     TYPE = R    PRICE = E    CATEGORY = -
     TYPE = H    PRICE = C    CATEGORY = +
     TYPE = J    PRICE = C    CATEGORY = +
     TYPE = R    PRICE = E    CATEGORY = +
     TYPE = H    PRICE = E    CATEGORY = +
     TYPE = J    PRICE = E    CATEGORY = -
     TYPE = J    PRICE = C    CATEGORY = -
     TYPE = H    PRICE = E    CATEGORY = +
     TYPE = J    PRICE = E    CATEGORY = +
     TYPE = R    PRICE = E    CATEGORY = -
     TYPE = J    PRICE = C    CATEGORY = +
     TYPE = R    PRICE = E    CATEGORY = +

And our TEST set:

     TYPE = J    PRICE = C    CATEGORY = +
     TYPE = J    PRICE = E    CATEGORY = -
     TYPE = R    PRICE = C    CATEGORY = -
     TYPE = R    PRICE = E    CATEGORY = -
     TYPE = H    PRICE = E    CATEGORY = +

Apply the decision tree algorithm given in R&N Fig 18.5 (page 658)  to the TRAINING set, using the MAX-GAIN algorithm to choose the best attribute. Show all your work, including the final decision tree.

If multiple features tie for the best one, choose the one whose name appears earliest in alphabetical order (e.g., AGE before PRICE before TYPE). If there is a tie in computing Majority-Value, choose '-'.

Some useful logarithmic formulas here are:

The optional final exam is Friday, December 12.  Important information:

  • The exam is from 8:30 am until 10:20 am.
  • You should bring a calculator that can calculate logarithms, as well as the usual pens and pencils.
  • You can bring whatever notes you have taken during the course.