This homework is optional – it contains extra credit problems.
[Extra credit points will be added separately to the course total,
so you will not
be penalized if you skip extra credit problems.]
Please turn in your solutions to these extra credit problems by
midnight the last day
of classes (Friday, June 7, 2013 midnight).
Extra Credit Submission Procedure:
Create a Zip file called "528-extracredit-lastname-firstname" containing the following:
(1) Document with write-up specifying the extra credit problem you are attempting, with your
answers to any questions asked in the problem, as well as any figures, plots, or graphs
supporting your answers,
(2) Your Matlab program files,
(3) Any other supporting material needed to understand/run your solutions in Matlab.
Upload your Zip file to this dropbox.
Upload your file by 11:59pm Friday, June 7, 2013.
1. Unsupervised Learning (20 points): Write Matlab code to implement Oja’s Hebb
rule (Equation 8.16 in the Dayan & Abbott textbook) for a single linear neuron
(as in Equation 8.2) receiving as input the 2D data provided in c10p1.mat
but with the mean of the data subtracted from each data point. Use “load –ASCII
c10p1.mat” and type “c10p1” to see the 100 (x,y) data points. You may plot them using
“scatter(c10p1(:,1),c10p1(:,2))”. Compute and subtract the mean (x,y) value from each
(x,y) point. Display the points again to verify that the data cloud is now centered around
0. Implement a discrete-time version (like Equation 8.7) of the Oja rule with a = 1.
Start with a random w vector and update it according to w(t+1) = w(t) + delta*dw/dt,
where delta is a small positive constant (e.g., delta = 0.01) and dw/dt is given by the Oja
rule (assume tw = 1). In each update iteration, feed in a data point u = (x,y) from
c10p1. If you’ve reached the last data point in c10p1, go back to the first one and
repeat. Keep updating w until the change in w, given by norm(w(t+1) - w(t)), is negligible
(i.e., below an arbitrary small positive threshold), indicating that w has converged.
a. To illustrate the learning process, print out figures displaying the current weight vector
w and the input data scatterplot on the same graph, for different time points during the
learning process.
b. Compute the principal eigenvector (i.e., the one with largest eigenvalue) of the zero-
mean input correlation matrix (this will be of size 2 x 2). Use the matlab function “eig”
to compute its eigenvectors and eigenvalues. Verify that the learned weight vector w
is proportional to the principal eigenvector of the input correlation matrix (read
Sections 8.2 and 8.3).
2. Supervised Learning (20 points):
In class, we discussed neural networks
that use either a threshold or sigmoid activation function. Consider networks
whose neurons have linear activation
functions, i.e., each neuron’s output is given by g(a) = ba+c, where a is the weighted sum of inputs to the neuron, and b and c are two fixed real numbers.
a. Suppose you have a single
neuron with a linear activation function g as above and input x = [x1,…,xn]T and weights W = [W1,…,Wn]T. Write down
the squared error function for this input if the true output is y.
b. Write down the weight update
rule for the neuron based on gradient descent on the above error function.
c. Now consider a network of linear neurons with one
hidden layer of m units, n input units, and one output unit. For
a given set of weights wkj in the input-hidden layer
and Wj
in the hidden-output layer, write down the equation for the output unit as a
function of wkj,
Wj,
and input x (you can write your
answer in vector-matrix form or using summations). Show that there is a
single-layer linear network with no hidden units that computes the same
function.
d. Given your result in (c), what can you conclude about
the computational power of N-hidden-layers
linear networks for N = 1, 2, 3, …? Explain your
answer.
3.
Reinforcement
Learning (20 points)
Implement actor
critic learning (equations 9.24 and 9.25 in the Dayan & Abbott textbook)
in Matlab for the maze of figure 9.7, with learning rate epsilon = 0.5 for both actor and critic, and
beta = 1 for the critic. Starting from zero weights for both the actor and critic, plot learning
curves as in figures 9.8 and 9.9. Next, start from a policy in which the agent is biased to
go left at both B and C, with initial probability 0.99 (and with 0.5 initial probability at A).
How does this affect learning at A?
(Note: You will need to sample from a discrete distribution to get an action for each location.
You can write your own code for this or use code available online such as this (with n = 1).)