I am interested in The Netflix Challenge. I am not very sure about the format of the dataset, or what has been done previously, but here are my thoughts: Let's say we want to predict how much Mary likes the movie "Mission: Impossible 3". Assuming the dataset includes Mary's ratings for other movies she watched (say, The Bourne Ultimatum, War of the Worlds, and Ratatouille). Intuitively, the more similar Mission:Impossible 3 is to the movies which Mary gave high ratings for, the more likely that Mary will like Mission:Impossible 3 (and vice versa). However, defining similarity between two movies is not a straightforward task. Some possible metrics are as follow: 1. If the two movies share the same subset of categories, they are simiilar The categories of the four movies are as follow: Mission:Impossible 3 - Action/Adventure/Thriller The Bourne Ultimatum - Action/Adventure/Drama/Mystery/Thriller War of the Worlds - Adventure/Drama/Sci-Fi/Thriller Ratatouille - Animation/Comedy/Family Since The Bourne Ultimatum shares the most number of categories with Mission:Impossible 3, it is the most similar to MI3. War of the Worlds only shares the category "Adventure" with Mission:Impossible 3, therefore it is less similar to MI3 than is Bourne to MI3. Finally, Ratatouille doesn't share any category with Mission:Impossible 3, so we would say it is the least similar to MI3. 2. If the two movies share the same subset of main actors/directors, they are similar Tom Cruise is the main character in both Mission:Impossible 3 and War of the Worlds, so in this measuring metric, War of the Worlds is more similar to MI3 than the other two movies. There may be other kinds of metrics we can use to define similarity between two movies, depending on what is available in the dataset. For example, if the dates when Mary rented those movies are available, we can do some more fine tuning (say, if Mary hated The Bourne Identity (2002) but she liked The Bourne Ultimatum(2007), we could conclude that Mary's taste has changed, and it is likely that she will like Mission:Impossible 3(2006). I am thinking about borrowing the model from PageRank to calculate the similarity. I haven't figured out the exact details about what heuristics to use, but the idea is that, we can weight the predicted ratings for Mission:Impossible 3 by the categories, actors, or directors which Mission:Impossible 3 shared with the movies Mary previously rated. There is a major difference from PageRank is that we have two things to keep track of: the similarity and the ratings. We can probably combine the two to create a score of "likelihood for Mary to like this movie". This score may increase if Mission:Impossible 3 shares similar categories/actors/directors with other positively rated movies, and decreases otherwise. Some of the things to look into: 1. How to find similarity based on multiple attributes? In PageRank, there is only one attribute, namely, the title of the article. 2. What if Mary has never watched anything similar to Mission:Impossible 3? It doesn't mean she won't like it. Does it make sense here to say she will be neutral towards MI3? 3. Defining similarity based on one of the two metrics above may be too simplistic. Is it enough to be "intuitively make sense"? How do we prove it that it will work?