**This homework is optional – it contains extra credit problems.**

**[Extra credit points will be added separately to the course total, **

**so**** you will not
be penalized if you skip extra credit problems.]**

**Please turn in your solutions to these extra credit problems by **

**midnight**** the last day
of classes (Friday, June 7, 2013 midnight).**

Extra Credit Submission Procedure:

`Create a Zip file called "528-extracredit-`*lastname*-*firstname*" containing the following:

(1) Document with write-up specifying the extra credit problem you are attempting, with your

answers to any questions asked in the problem, as well as any figures, plots, or graphs

supporting your answers,

`(2) Your Matlab program files,`

`(3) Any other supporting material needed to understand/run your solutions in Matlab.`

Upload your Zip file to this dropbox.

`Upload your file by 11:59pm Friday, June 7, 2013. `

1.Unsupervised Learning (20 points):Write Matlab code to implement Oja’s Hebb

rule (Equation 8.16 in the Dayan & Abbott textbook) for a single linear neuron

` (as in Equation 8.2) receiving as input the 2D data provided in c10p1.mat`

` but with the `__mean of the data subtracted from each data point__. Use “load –ASCII

` c10p1.mat” and type “c10p1” to see the 100 (x,y) data points. You may plot them using `

` “scatter(c10p1(:,1),c10p1(:,2))”. Compute and subtract the mean (x,y) value from each `

` (x,y) point. Display the points again to verify that the data cloud is now centered around`

0. Implement a discrete-time version (like Equation 8.7) of the Oja rule with a = 1.

` Start with a random `**w** vector and update it according to **w**(t+1) = **w**(t) + delta*d**w**/dt,

` where delta is a small positive constant (e.g., delta = 0.01) and d`**w**/dt is given by the Oja

rule (assume t_{w}= 1). In each update iteration, feed in a data pointu= (x,y) from

` c10p1. If you’ve reached the last data point in c10p1, go back to the first one and `

` repeat. Keep updating `**w** until the change in **w,** given by *norm*(**w**(t+1) - **w**(t)), is negligible

` (i.e., below an arbitrary small positive threshold), indicating that `**w **has converged.

a. To illustrate the learning process, print out figures displaying the current weight vector

` `**w** and the input data scatterplot on the same graph, for different time points during the

learning process.

b. Compute the principal eigenvector (i.e., the one with largest eigenvalue) of the zero-

` mean input correlation matrix (this will be of size 2 x 2). Use the matlab function “eig” `

to compute its eigenvectors and eigenvalues. Verify that the learned weight vectorw

` is proportional to the principal eigenvector of the input correlation matrix (read `

` Sections 8.2 and 8.3).`

2. **Supervised Learning (20 points**):
In class, we discussed neural networks
that use either a threshold or sigmoid activation function. Consider networks
whose neurons have *linear activation
functions*, i.e., each neuron’s output is given by *g(**a) = ba+c*, where *a* is the weighted sum of inputs to the neuron, and *b* and *c* are two fixed real numbers.

a. Suppose you have a single
neuron with a linear activation function *g* as above and input **x** = [*x*_{1}*,…,x _{n}*]

b. Write down the weight update
rule for the neuron based on gradient descent on the above error function.

c. Now consider a network of linear neurons with one
hidden layer of *m* units, *n* input units, and one output unit. For
a given set of weights *w _{kj}*

d. Given your result in (c), what can you conclude about
the computational power of *N*-hidden-layers
linear networks for N = 1, 2, 3, …? Explain your
answer.

3.
**Reinforcement**
**Learning (20 points)**

Implement actor
critic learning (equations 9.24 and 9.25 in the Dayan & Abbott textbook)

in Matlab for the maze of figure 9.7, with learning rate epsilon = 0.5 for both actor and critic, and

beta = 1 for the critic. Starting from zero weights for both the actor and critic, plot learning

curves as in figures 9.8 and 9.9. Next, start from a policy in which the agent is biased to

go left at both B and C, with initial probability 0.99 (and with 0.5 initial probability at A).

`How does this affect learning at A?`

`(Note: You will need to sample from a discrete distribution to get an action for each location. `

`You can write your own code for this or use code available online such as this (with n = 1).)`