CSE 473 - AI - Spring 2002 - Project 2:

Collaborative Filtering

Contents

  1. Due Date: Fri, June 7, 2002 before 9:30AM(The last week of class)

  2. Form a group of two, and email Nan (annli@cs.washington.edu) by saying who your group is.

  3. Read the paper Empirical analysis of predictive algorithms for collaborative filtering

  4. Implement the collaborative filtering algorithm described in Section 2.1 of the paper, calculating the "weight" based on "correlation"(Section 2.1.1). You may program in C/C++, Java, Perl, LISP, or Scheme. If you'd like to use another language, ask Nan first. You will have to write the input routines, but the files are very simple tab delimited fixed columns.

  5. Read the EachMovie documentation NOTE: Ignore the details of the schema, and instructions on getting the data. Then follow the instruction to download the cleaned and simplified EachMovie training and test databases. Details of the files are in the included README file.

  6. Apply your collaborative filtering system to the EachMovie data and evaluate it using the method described in Section 3 of the paper. In particular, use the S_a measure described in Section 3.1, and the "All but 1" protocol described in Section 3.3. Learn on the training database provided, and test on the test database provided.

  7. Try to improve the basic collaborative filtering algorithm you implemented, for example using one or more of the enhancements described in the paper, or an enhancement or enhancements of your own design. Apply the improved algorithm(s) to the EachMovie data and compare it/them with the original one, using the methodology described above.

  8. For each member of the group, create your own vector of movie preferences using movies in the database that you've seen, and run it through the collaborative filtering system. Check out the ten movies that the system recommends most highly. Do you agree with its predictions? What do you think went right or wrong in the system's recommendation process in your case?

Turn-in/Grading

Data Sets

The data is provided by Compaq Computer Corporation. In order to gain access to the data sets you need to send Nan(annli@cs.washington.edu) an email with the following form filled out. The form basically says that you will only use the data for educational purposes related to the class, and that you will not distribute the data to anyone. For fastest response please use the exact subject "CSE473: Data Set Access Request". I will then respond with a URL that you can use to download the data. (Here is an easily downloadable plain text version.)
Copyright  Compaq Computer Corporation 1997-2001. 

     The preference data set was compiled by Compaq Computer Corporation using our collaborative filtering technology. Compaq is making the data set available for use under the
     terms that apply to this Compaq web site (see Legal) including the following terms: 

     1. All information is provided "AS IS". Compaq makes no warranties or representations with respect to the completeness or accuracy of the information or otherwise. COMPAQ
     DISCLAIMS ALL WARRANTIES WITH REGARD TO THE INFORMATION, INCLUDING ANY IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
     FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. 

     2. In no event shall Compaq be liable for damages, and in particular Compaq shall not be liable for special, indirect, consequential, or incidental damages, or damages for lost profits,
     loss of revenue, or loss of use, arising out of or related to the information or the use or dissemination thereof, whether such damages arise in contract, negligence, tort, under statute,
     in equity, at law or otherwise. 

     3. The user may use the information only for research purposes which are non-commercial and non-revenue bearing. Any published research results or other publications
     resulting from use of the information shall credit Compaq Equipment Corporation as the provider of the data. The user agrees to provide Compaq with a copy of any such
     publication using any of the contact names provided at this web site. The user may make copies of the data set as needed for internal use only for the preceding purposes. All such
     copies shall duplicate Compaq's copyright notice and this notice. 

Please reply with the following information to agree to the above legal agreement. 

E-mail: 
Name: 
Company / University: 
Phone:
Address:

Good luck, and have fun!