For the code in Part 2, it is fine if you do not write any new functions other than the ones we provided starter code for. It is fine if you do write new functions, but you will not be graded down if you do not. Be sure any new functions you add are in good python style including a docstring comment.

Note

social_network_tests.py will not test the components of Part 2.

Problem 5: Create a Facebook Graph

In this problem, you will create a graph from provided Facebook data. Creating the facebook graph is all you have to do for this problem, as we will use this graph for Problems 6, 7, and 8.

The Facebook Data

The facebook-links.txt file in your homework5 directory is courtesy of the Max Planck Institute for Software Systems. Here is a slightly clarified version of the documentation for this file:

Description: The file facebook-links.txt contains a list of all of the user-to-user links from the Facebook New Orleans networks. These links are undirected on Facebook.

Format: Each line contains two numeric user identifiers, meaning the second user appeared in the first user’s friend list, and the first user appeared in the second user’s friend list. Finally, the third column is a UNIX timestamp with the time of link establishment (if it could be determined, otherwise it is ‘N‘).

Info

A Unix timestamp is the number of seconds since January 1, 1970. You may ignore timestamps in this assignment. (Facebook does use the recency of your activity to help it make recommendations.)

A snippet of the file appears below:

35467    17494    1197992662
35467    4190    \N
35467    18822    1209937599
37188    7741    1219156787
37188    8561    1199853037

Important

When reading in this data you should be sure to convert the numeric user identifiers into ints. Storing them as ints will make your life easier and is the output format that we require for printing (Do NOT print a user ID as the string '19611'; instead print it as the integer 19611.) This means your code should work whether nodes are named with strings (as we did with the practice graphs) or ints (as we are doing with the facebook data). In NetworkX, nodes can be strings or integers.

Create the Graph

Create a graph named facebook from the Facebook data in file facebook-links.txt. You should implement get_facebook_graph() as a helper function. As in Part 1, use the Graph class from the NetworkX module.

Note

Don’t be alarmed if reading the Facebook data takes a little while. The file is large, and it may take up to a minute for your program to read it. However, do not try to draw the Facebook graph. This may cause your computer to hang, and even if it were successful, you would not learn much from a tangled mess of 817,090 edges. (Remember from the testing lecture(s): If you’re trying to verify something is true, what’s a smaller data set that might help?)

The starter code folder also contains a smaller file, facebook-links-small.txt, that you can use to test your code. Sample output for this file is found in facebook-links-small-output.txt. You can use the Diff Checker tool to compare your output to the sample output. Feel free to make even smaller files of your own for testing purposes! Although data in facebook-links-small.txt isn’t sorted, your solution should produce sorted output regardless of how original data is organized in the file.

You do not need to read in the file name from the command line as we did in previous homeworks. Instead, you can assume that the file facebook-links.txt will be present in the same directory as social_network.py and use the filename "facebook-links.txt" in your program as needed.

Tip

You may find it useful to refer to the slides about File I/O.

Tip

We recommend that you run the program from the command line as you have before, although hitting the triangle button in VSCode should also work.

Looking at the assert statements in social_network.py, notice that the number of nodes in your facebook graph should be 63731. Are there other tests you could write to convince yourself that the graph was created correctly?

Problem 6: Recommend by number of common friends for Facebook

For every Facebook user with a user id that is a multiple of 1000, print a list containing the first 10 friend recommendations, as determined by “number of common friends” friendship score. If there are fewer than 10 recommendations, print all the recommendations.

You can review more details about the format for the printed output in the Setup section. This is a sample of part of what you would expect for your output (you should be printing more lines than this, and they should be listed in ascending order of userID as shown below (e.g. user 28000, followed by user 29000, etc.):

...
 28000 (by num_common_friends): [23, 1445, 4610, 7996, 10397, 11213, 56, 85, 471, 522]
 29000 (by num_common_friends): [28606]
 30000 (by num_common_friends): [862, 869, 919, 941, 3154, 8180, 8269, 8614, 14473, 14495]
...

Tip

List slices may be useful for Problem 6 and 7.

Important

You should be printing more lines than this, and they should be listed in ascending order of userID (e.g. user 28000, followed by user 29000, etc.)

Problem 7: Recommend by influence for Facebook

For every Facebook user with a user id that is a multiple of 1000, print a list containing the first 10 friend recommendations as determined by the “influence” friendship score. If there are fewer than 10 recommendations, print all the recommendations. Again, the format for the printed output is in the Output Format section. This is a sample of part of what you would expect for your output:

...
28000 (by influence): [7033, 17125, 15462, 33049, 51105, 16424, 23, 7996, 725, 1539]
29000 (by influence): [28606]
30000 (by influence): [862, 869, 919, 941, 3154, 8269, 14473, 14495, 17951, 19611]
...

Important

You should be printing more lines than this, and they should be listed in ascending order of userID (e.g. user 28000, followed by user 29000, etc.)

Problem 8: Does the recommendation algorithm make a difference for Facebook?

Considering only those 63 Facebook users with an id that is a multiple of 1000, compute and print the number of Facebook users who have the same first 10 friend recommendations under both recommendation systems, and the number of Facebook users who have different first 10 friend recommendations under the two recommendation systems.

For example, in the data above, user 29000 has the same first ten recommendations under both recommendation systems, while user 28000 and user 30000 both have different recommendations under the two recommendation systems. This program may take some time to compute (possibly a minute or so) depending on the efficiency of your solution.

The format of this problem’s printed output is given in the Output Format section.

Final Checks

Be sure that your code is in its final version before turning it in.

  • In Problem 5, make sure your graph is named exactly facebook.
  • Ensure the facebook graph is using the data from facebook-links-small.txt and not facebook-links.txt. Not doing so will cause our autograder to fail. We will be replacing the file in the background with contents in facebook-links-small.txt, so don’t be surprised if the autograder shows a different output than the one you see on your computer.
  • Make sure your code run completely without any failed assertions or other errors?
  • Finally, triple-check that the printed output in exactly the format specified in the Output Format section, including spaces, blank links, and punctuation.

Submit your work

You are almost done! Complete these final action items:

  1. At the bottom of your social_network.py file, in the “Collaboration” part, state which students or other people (besides the course staff) helped you with the assignment, or that no one did.

  2. Submit social_network.py via Gradescope. You do not need to submit social_network_tests.py, even if you edited it.

  3. Answer this REQUIRED Feedback Survey asking how much time you spent and other reflections on Part 2 of the assignment.

Now you are done!