Recommendation systems

Facebook suggests people you may be (or might want to be) friends with. Netflix suggests movies you might like. Amazon suggests products to buy. How do they do that? In this assignment, you will learn one simple way to make such suggestions, called collaborative filtering. The actual algorithms used by these companies are closely-guarded trade secrets.

A computer system that makes suggestions is called a recommender system. As background, there are two general approaches: collaborative filtering and content-based filtering.

  • Collaborative filtering (what we will implement) is a way of recommending things to you by comparing your preferences to other users. If your past preferences were similar to another user’s, then your future preferences may be as well. As a concrete example, suppose that you like John, Paul, and George. Other people like John, Paul, George, and also Ringo. Based on this information, it is likely that you will like Ringo as well, even if you had never previously heard of him. The recommender system does not have to understand anything about what “John”, “Paul”, “George”, and “Ringo” are — they could even be brands of toilet paper, and the algorithm would work identically.
  • Content-based filtering considers the characteristics of the things you like, and it recommends similar sorts of things. For instance, if you like “Billie Jean”, “Crazy Train”, and “Don’t Stop the Music”, then you might like other songs in the key of F-sharp minor, such as Rachmaninoff’s “Piano Concerto No. 1”, even if no one else has ever had that particular set of favorite songs before.

In this assignment, you will implement a collaborative filtering recommendation system for suggesting friends on Facebook.

Representing a social network as a graph

A graph or network represents relationships among things. The things are represented as nodes or vertices, and the relationships are represented as edges. One common use for a graph is to represent travel, such as on a road map or airline map. The nodes of the graph are cities, and the edges show which cities are connected.

Above: A graph of Cities and Roads in the Northeastern US

Above: A graph of common airline routes to/from New York City

Another common use for a graph is to represent friendship among people in a social network. For example, here is the friendship graph for some of the characters of “Romeo and Juliet”:

Above: A social/friendship graph of characters in “Romeo and Juliet”.

An edge between person A and person B means that A considers B a friend, and also B considers A a friend.

Recommending friends

In this assignment you will implement two mechanisms for recommending a new friend in a social network. A simple way to state this question is:

“For user X, who is the best person to recommend to them as a friend?”

You will actually answer a more comprehensive question:

“For user X, list some non-friends in order, starting with the best friend recommendation and ending with the worst.”

A non-friend is a user who is not X and is not a current friend of X. Depending on the recommendation algorithm, the list may include all non-friends of X or only some of them.

For example, from the Romeo and Juliet graph above, the list produced for Mercutio might be:

  1. Capulet
  2. Montague
  3. Benvolio
  4. Friar Laurence
  5. Juliet

Info

These recommendations might not be symmetric, which means that the best friend recommendation for Montague might be Mercutio, but the best friend recommendation for Mercutio might be Capulet.

Your task will be to write code that, given a user X in the social network, produces friend recommendations for X in order from best to worst. You will do this by assigning each potential friend a number called a score, where higher scores indicate a better match. Then, you will sort your list according to those scores. Given user X, if two people Y and Z would be equally good as new friends for X (they have the same score), then they should be listed in alphabetical order (for names as in Part 1) or numerical order (for numerical user IDs as in Part 2).