CSE 473 - AI - Spring 2002 - Project 2:
Collaborative Filtering
Contents
-
Due Date: Fri, June 7, 2002 before 9:30AM(The last week of class)
- Form a group of two, and email Nan (annli@cs.washington.edu) by saying who your group is.
- Read the paper Empirical analysis of predictive algorithms for collaborative filtering
- Implement the collaborative filtering algorithm described in Section 2.1 of the paper, calculating the "weight" based on "correlation"(Section 2.1.1). You may program in C/C++, Java, Perl, LISP, or Scheme. If you'd like to use another language, ask Nan first. You will have to write the input routines, but the files are very simple tab delimited fixed columns.
- Read the EachMovie documentation NOTE: Ignore the details of the schema, and instructions on getting the data. Then follow the instruction to download the cleaned and simplified EachMovie training and test databases. Details of the files are in the included README file.
- Apply your collaborative filtering system to the EachMovie data and
evaluate it using the method described in Section 3 of the paper.
In particular, use the S_a measure described in Section 3.1, and the
"All but 1" protocol described in Section 3.3. Learn on the training
database provided, and test on the test database provided.
- Try to improve the basic collaborative filtering algorithm you implemented,
for example using one or more of the enhancements described in the paper,
or an enhancement or enhancements of your own design. Apply the improved
algorithm(s) to the EachMovie data and compare it/them with the original
one, using the methodology described above.
- For each member of the group, create your own vector of movie preferences
using movies in the database that you've seen, and run it through
the collaborative filtering system. Check out the ten movies that
the system recommends most highly. Do you agree with its predictions?
What do you think went right or wrong in the system's recommendation
process in your case?
- You should turn in the following:
- Your code, and reasonable documentation for it (i.e., enough for us
to understand how it works and how to use it). Please place this documentation
in a file named
README
.
- A report of at most 4 pages (letter size, 1in margins, 12pt font)
describing the improvements you tried and why, the S_a scores you
obtained with the various versions, and your answers to the questions
in item 7 for each group member. Also, you should make it clear the breakdown of what each group member did in the project.
- Turn-in procedure:
- Please email a zip (either .zip or .tar.gz) file with your code and documentation to annli@cs.washington.edu before class on Fri, June 7. Please use the subject "CSE473: Project 2 Submission", and in the text part of the message include
both names and student id's.
- Please turn in a hard copy of your group's report in class on Fri, June 7.
- We may ask you to do a demo of your system and/or an oral discussion.
- Grading: This project accounts for 25% of the grade in the course. The basic algorithm and the enhancements each will account 10%. The left 5% is for your write-up. So make sure to demonstrate clearly the exciting bells and whistles in your project.
Data Sets
The data is provided by Compaq Computer Corporation. In order to gain access to the data sets you need to send Nan(annli@cs.washington.edu) an email with the following form filled out. The form basically says that you will only use the data for educational purposes related to the class, and that you will not distribute the data to anyone. For fastest response please use the exact subject "CSE473: Data Set Access Request". I will then respond with a URL that you can use to download the data. (Here is an easily downloadable plain text version.)
Copyright © Compaq Computer Corporation 1997-2001.
The preference data set was compiled by Compaq Computer Corporation using our collaborative filtering technology. Compaq is making the data set available for use under the
terms that apply to this Compaq web site (see Legal) including the following terms:
1. All information is provided "AS IS". Compaq makes no warranties or representations with respect to the completeness or accuracy of the information or otherwise. COMPAQ
DISCLAIMS ALL WARRANTIES WITH REGARD TO THE INFORMATION, INCLUDING ANY IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
2. In no event shall Compaq be liable for damages, and in particular Compaq shall not be liable for special, indirect, consequential, or incidental damages, or damages for lost profits,
loss of revenue, or loss of use, arising out of or related to the information or the use or dissemination thereof, whether such damages arise in contract, negligence, tort, under statute,
in equity, at law or otherwise.
3. The user may use the information only for research purposes which are non-commercial and non-revenue bearing. Any published research results or other publications
resulting from use of the information shall credit Compaq Equipment Corporation as the provider of the data. The user agrees to provide Compaq with a copy of any such
publication using any of the contact names provided at this web site. The user may make copies of the data set as needed for internal use only for the preceding purposes. All such
copies shall duplicate Compaq's copyright notice and this notice.
Please reply with the following information to agree to the above legal agreement.
E-mail:
Name:
Company / University:
Phone:
Address:
Good luck, and have fun!