For the code in Part 2, it is fine if you do not write any new functions other than the ones we provided starter code for. It is fine if you do write new functions, but you will not be graded down if you do not. Be sure any new functions you add are in good python style including a docstring comment.
Note
social_network_tests.py
will not test the components of Part 2.
Problem 5: Create a Facebook Graph¶
In this problem, you will create a graph from provided Facebook data. Creating the facebook
graph is all you have to do for this problem, as we will use this graph for Problems 6, 7, and 8.
The Facebook Data¶
The facebook-links.txt
file in your homework5
directory is courtesy of the Max Planck Institute for Software Systems. Here is a slightly clarified version of the documentation for this file:
Description: The file
facebook-links.txt
contains a list of all of the user-to-user links from the Facebook New Orleans networks. These links are undirected on Facebook.Format: Each line contains two numeric user identifiers, meaning the second user appeared in the first user’s friend list, and the first user appeared in the second user’s friend list. Finally, the third column is a UNIX timestamp with the time of link establishment (if it could be determined, otherwise it is ‘N‘).
Info
A Unix timestamp is the number of seconds since January 1, 1970. You may ignore timestamps in this assignment. (Facebook does use the recency of your activity to help it make recommendations.)
A snippet of the file appears below:
35467 17494 1197992662
35467 4190 \N
35467 18822 1209937599
37188 7741 1219156787
37188 8561 1199853037
Important
When reading in this data you should be sure to convert the numeric user identifiers into ints. Storing them as ints will make your life easier and is the output format that we require for printing (Do NOT print a user ID as the string '19611'
; instead print it as the integer 19611
.) This means your code should work whether nodes are named with strings (as we did with the practice graphs) or ints (as we are doing with the facebook data). In NetworkX, nodes can be strings or integers.
Create the Graph¶
Create a graph named facebook
from the Facebook data in file facebook-links.txt
. You should implement get_facebook_graph()
as a helper function. As in Part 1, use the Graph class from the NetworkX module.
Note
Don’t be alarmed if reading the Facebook data takes a little while. The file is large, and it may take up to a minute for your program to read it. However, do not try to draw the Facebook graph. This may cause your computer to hang, and even if it were successful, you would not learn much from a tangled mess of 817,090 edges. (Remember from the testing lecture(s): If you’re trying to verify something is true, what’s a smaller data set that might help?)
The starter code folder also contains a smaller file, facebook-links-small.txt
, that you can use to test your code. Sample output for this file is found in facebook-links-small-output.txt
. You can use the Diff Checker tool to compare your output to the sample output. Feel free to make even smaller files of your own for testing purposes! Although data in facebook-links-small.txt
isn’t sorted, your solution should produce sorted output regardless of how original data is organized in the file.
You do not need to read in the file name from the command line as we did in previous homeworks. Instead, you can assume that the file facebook-links.txt
will be present in the same directory as social_network.py
and use the filename "facebook-links.txt"
in your program as needed.
Tip
You may find it useful to refer to the slides about File I/O.
Tip
We recommend that you run the program from the command line as you have before, although hitting the triangle button in VSCode should also work.
Looking at the assert statements in social_network.py
, notice that the number of nodes in your facebook graph should be 63731. Are there other tests you could write to convince yourself that the graph was created correctly?
Problem 6: Recommend by number of common friends for Facebook¶
For every Facebook user with a user id that is a multiple of 1000, print a list containing the first 10 friend recommendations, as determined by “number of common friends” friendship score. If there are fewer than 10 recommendations, print all the recommendations.
You can review more details about the format for the printed output in the Setup section. This is a sample of part of what you would expect for your output (you should be printing more lines than this, and they should be listed in ascending order of userID as shown below (e.g. user 28000, followed by user 29000, etc.):
...
28000 (by num_common_friends): [23, 1445, 4610, 7996, 10397, 11213, 56, 85, 471, 522]
29000 (by num_common_friends): [28606]
30000 (by num_common_friends): [862, 869, 919, 941, 3154, 8180, 8269, 8614, 14473, 14495]
...
Tip
List slices may be useful for Problem 6 and 7.
Important
You should be printing more lines than this, and they should be listed in ascending order of userID (e.g. user 28000, followed by user 29000, etc.)
Problem 7: Recommend by influence for Facebook¶
For every Facebook user with a user id that is a multiple of 1000, print a list containing the first 10 friend recommendations as determined by the “influence” friendship score. If there are fewer than 10 recommendations, print all the recommendations. Again, the format for the printed output is in the Output Format section. This is a sample of part of what you would expect for your output:
...
28000 (by influence): [7033, 17125, 15462, 33049, 51105, 16424, 23, 7996, 725, 1539]
29000 (by influence): [28606]
30000 (by influence): [862, 869, 919, 941, 3154, 8269, 14473, 14495, 17951, 19611]
...
Important
You should be printing more lines than this, and they should be listed in ascending order of userID (e.g. user 28000, followed by user 29000, etc.)
Problem 8: Does the recommendation algorithm make a difference for Facebook?¶
Considering only those 63 Facebook users with an id that is a multiple of 1000, compute and print the number of Facebook users who have the same first 10 friend recommendations under both recommendation systems, and the number of Facebook users who have different first 10 friend recommendations under the two recommendation systems.
For example, in the data above, user 29000 has the same first ten recommendations under both recommendation systems, while user 28000 and user 30000 both have different recommendations under the two recommendation systems. This program may take some time to compute (possibly a minute or so) depending on the efficiency of your solution.
The format of this problem’s printed output is given in the Output Format section.
Final Checks¶
Be sure that your code is in its final version before turning it in.
- In Problem 5, make sure your graph is named exactly
facebook
. - Ensure the
facebook
graph is using the data fromfacebook-links-small.txt
and notfacebook-links.txt
. Not doing so will cause our autograder to fail. We will be replacing the file in the background with contents infacebook-links-small.txt
, so don’t be surprised if the autograder shows a different output than the one you see on your computer. - Make sure your code run completely without any failed assertions or other errors?
- Finally, triple-check that the printed output in exactly the format specified in the Output Format section, including spaces, blank links, and punctuation.
Submit your work¶
You are almost done! Complete these final action items:
- At the bottom of your
social_network.py
file, in the “Collaboration” part, state which students or other people (besides the course staff) helped you with the assignment, or that no one did. - Submit
social_network.py
via Gradescope. You do not need to submitsocial_network_tests.py
, even if you edited it. - Answer this REQUIRED Feedback Survey asking how much time you spent and other reflections on Part 2 of the assignment.
Now you are done!