Due Date: Wed, May 23, 2007 at the start
of class.
- Read the paper Empirical Analysis
of Predictive Algorithms for Collaborative Filtering. You need to
read up to Section 2.1, and are encouraged to read further if you have
time.
- The dataset we will be using is a subset of the movie
ratings data from
the Netflix Prize. You can download it here (link
deactivated).
It contains a training set, a test set, a movies file, a dataset
description file, and a
README
file. The training and test sets are both subsets of the Netflix
training data.You will use the ratings provided in the training set to
predict
those in the test set. You will compare your predictions with the
actual ratings provided in the test set. The
evaluation metrics you need to measure are the Mean Absolute Error and
the Root Mean Squared Error. The dataset description file further
describes the
dataset, and will help you get started. The README file is from the
original set of Netflix files, and has been included to comply with the
terms of use for this data.
- Implement the collaborative filtering algorithm described
in Section 2.1 of the paper (Equations 1 and 2; ignore Section 2.1.2)
for making the predictions. You may program in C, C++, Java, or C#. If
you'd like to use another language, ask Bhushan first.
- Try to improve the basic algorithm you implemented, for
example using one or more of the enhancements described in the paper,
or enhancements of your own design.
- (10% Extra Credit) Add yourself as a new user to the
training set.
To do this, you will need to create a new user ID for yourself. Select
some movies that you have seen among those in the training set, and add
your ratings for them to the training set. Extend your system to output
predictions for the movies you haven't rated, ranked in order of
decreasing ratings. Do you agree with the predictions of your system?
Check out some of the top ranked movies that you haven't seen (but only
after you have finished work on the project).
- Turn in the following:
- Your code, and reasonable documentation for it (i.e.,
enough for us to understand how it is organized and how to use it).
Please place this documentation in a file named
README .
- A report of at most 3 pages (letter size, 1in margins,
12pt font) describing the results you obtained with this algorithm, any
modifications you may have tried, and how they affected the
results. You can also include what you got upon adding yourself as a
new user.
Turn-in procedure:
Please email a zip file with your code and documentation to Bhushan
before
class on May 23. Turn in a hard copy of your report if you can.
Otherwise, just include it in the zip file.
- We may ask you to do a demo of your system and/or an oral
discussion.
Good luck, and have fun!
|