CSE 446 - Winter 2014 - Assignment 2

CSE 446: Machine Learning (Winter 2014)
Assignment 2: Collaborative Filtering and Bayesian Networks

Please submit both code and writeup online by 9:30am PST on Friday, February 14, 2014. Please provide all code (sufficiently commented) so that we can run it ourselves. Submit your writeup as a PDF and limit to four pages with reasonable fonts and margins.

Problem 1: Collaborative Filtering on Netflix Ratings

1.0 Read the paper Empirical Analysis of Predictive Algorithms for Collaborative Filtering. You need to read up to Section 2.1, and are encouraged to read further if you have time.

1.1 The dataset we will be using is a subset of the movie ratings data from the Netflix Prize. You can download it here. It contains a training set, a test set, a movies file, a dataset description file, and a README file. The training and test sets are both subsets of the Netflix training data. You will use the ratings provided in the training set to predict those in the test set. You will compare your predictions with the actual ratings provided in the test set. The evaluation metrics you need to measure are the Mean Absolute Error and the Root Mean Squared Error. The dataset description file further describes the dataset, and will help you get started. The README file is from the original set of Netflix files, and has been included to comply with the terms of use for this data.

1.2 Implement the collaborative filtering algorithm described in Section 2.1 of the paper (Equations 1 and 2; ignore Section 2.1.2) for making the predictions.

Extra-credit Add yourself as a new user to the training set. To do this, you will need to create a new user ID for yourself. Select some movies that you have seen among those in the training set, and add your ratings for them to the training set. Extend your system to output predictions for the movies you haven't rated, ranked in order of decreasing ratings. Do you agree with the predictions of your system? Check out some of the top ranked movies that you haven't seen (but only after you have finished work on the project).

Problem writeup:

A high-level description on how your code works.
The accuracy you obtain
If all your accuracies are low, tell us what you have tried to improve the accuracies and what you suspect is failing.
Regardless of whether your accuracies are good or bad, what are the shortcomings of the collaborative filtering algorithm and how might you augment it?

Problem 2: Bayesian Networks

2.1 You are on a night hike in an Ecuadorian cloud forest. A fig falls on your head, and you immediately point your camera at the trees and snap a burst of shots.

Back at camp, you find a photo that appears to be of the elusive Olinguito. You’re not sure; it could be a more common Olingo (there are nine times as many common Olingos as Olinguitos). The key distinguishing feature of Olinguitos is that they are much furrier than Olingos. In the photo, you observe a very furry creature, but this could also be due to the camera being out of focus.

Suppose:

There is a 40% chance of your camera being out of focus for any shot
An Olingo appears furry in 90% of unfocused shots
An Olinguito does not look furry in 10% of unfocused shots
If the camera is focused, you correctly observe furry Olinguitos or non-furry Olingos.

2.1.1. What is the probability that your photo is of an Olinguito?

2.1.2. You find other photos of the same animal from the burst. All four of your photos show a furry creature. What is the probability that the animal in these pictures was an Olinguito?

Hint: draw a Bayesian network.

2.2 Consider the following Bayesian network:

Is D independent of E?
Is A independent of B given C?
Is E independent of B given C?
Is A independent of B given D?
Is E independent of D given B?

Justify your answers.

2.3 Consider a Bayesian network with four Boolean nodes A, B, C, and D, where A is the parent of B, and B is the parent of C and D. Suppose you have a training set composed of the following examples in the form (A,B,C,D), with "?" indicating a missing value: (0,1,1,1), (1,1,0,0), (1,0,0,0), (1,0,1,1), (0,1,1,0), (1,?,0,1). Show the first iteration of EM algorithm (initial parameters, E-step, M-step), assuming the parameters are initialized ignoring missing values.

CSE 446: Machine Learning (Winter 2014) Assignment 2: Collaborative Filtering and Bayesian Networks

Problem 1: Collaborative Filtering on Netflix Ratings

Problem 2: Bayesian Networks

CSE 446: Machine Learning (Winter 2014)
Assignment 2: Collaborative Filtering and Bayesian Networks