This homework is optional – it contains extra credit problems.
[Extra credit points will be added separately to the course total,
so you will not be penalized if you skip extra credit problems.]
Please turn in your solutions to these extra credit problems by
midnight the last day of classes (Friday, June 7, 2013 midnight).
Extra Credit Submission Procedure:
Create a Zip file called "528-extracredit-lastname-firstname" containing the following:
(1) Document with write-up specifying the extra credit problem you are attempting, with your
answers to any questions asked in the problem, as well as any figures, plots, or graphs
supporting your answers,
(2) Your Matlab program files,
(3) Any other supporting material needed to understand/run your solutions in Matlab.
Upload your Zip file to this dropbox.
1. Unsupervised Learning (20 points): Write Matlab code to implement Oja’s Hebb
rule (Equation 8.16 in the Dayan & Abbott textbook) for a single linear neuron
(as in Equation 8.2) receiving as input the 2D data provided in c10p1.mat
but with the mean of the data subtracted from each data point. Use “load –ASCII
c10p1.mat” and type “c10p1” to see the 100 (x,y) data points. You may plot them using
“scatter(c10p1(:,1),c10p1(:,2))”. Compute and subtract the mean (x,y) value from each
(x,y) point. Display the points again to verify that the data cloud is now centered around
0. Implement a discrete-time version (like Equation 8.7) of the Oja rule with a = 1.
Start with a random w vector and update it according to w(t+1) = w(t) + delta*dw/dt,
where delta is a small positive constant (e.g., delta = 0.01) and dw/dt is given by the Oja
rule (assume tw = 1). In each update iteration, feed in a data point u = (x,y) from
c10p1. If you’ve reached the last data point in c10p1, go back to the first one and
repeat. Keep updating w until the change in w, given by norm(w(t+1) - w(t)), is negligible
(i.e., below an arbitrary small positive threshold), indicating that w has converged.
a. To illustrate the learning process, print out figures displaying the current weight vector
w and the input data scatterplot on the same graph, for different time points during the
b. Compute the principal eigenvector (i.e., the one with largest eigenvalue) of the zero-
mean input correlation matrix (this will be of size 2 x 2). Use the matlab function “eig”
to compute its eigenvectors and eigenvalues. Verify that the learned weight vector w
is proportional to the principal eigenvector of the input correlation matrix (read
Sections 8.2 and 8.3).
2. Supervised Learning (20 points): In class, we discussed neural networks that use either a threshold or sigmoid activation function. Consider networks whose neurons have linear activation functions, i.e., each neuron’s output is given by g(a) = ba+c, where a is the weighted sum of inputs to the neuron, and b and c are two fixed real numbers.
a. Suppose you have a single neuron with a linear activation function g as above and input x = [x1,…,xn]T and weights W = [W1,…,Wn]T. Write down the squared error function for this input if the true output is y.
b. Write down the weight update rule for the neuron based on gradient descent on the above error function.
c. Now consider a network of linear neurons with one hidden layer of m units, n input units, and one output unit. For a given set of weights wkj in the input-hidden layer and Wj in the hidden-output layer, write down the equation for the output unit as a function of wkj, Wj, and input x (you can write your answer in vector-matrix form or using summations). Show that there is a single-layer linear network with no hidden units that computes the same function.
d. Given your result in (c), what can you conclude about the computational power of N-hidden-layers linear networks for N = 1, 2, 3, …? Explain your answer.
3. Reinforcement Learning (20 points)
Implement actor critic learning (equations 9.24 and 9.25 in the Dayan & Abbott textbook)
in Matlab for the maze of figure 9.7, with learning rate epsilon = 0.5 for both actor and critic, and
beta = 1 for the critic. Starting from zero weights for both the actor and critic, plot learning
curves as in figures 9.8 and 9.9. Next, start from a policy in which the agent is biased to
go left at both B and C, with initial probability 0.99 (and with 0.5 initial probability at A).
How does this affect learning at A?
(Note: You will need to sample from a discrete distribution to get an action for each location.
You can write your own code for this or use code available online such as this (with n = 1).)