Datasets

What datasets can we use for the course project?

You can use any dataset of your choice as long as it enables an interesting and non-trivial project as described here. You can consider the list of publicly available datasets below (originally based on Stanford CS224W).

Reddit

Sage BioNetworks

1 Million Jupyter Notebooks

gab Social Network

Twitter

Instagram

Kiva Microlending

DonorsChoose Education Crowdfunding

Kaggle Competition Datasets

Food Webs

Wolfe Primates Interaction

Trade Networks

Stack Exchange

Microfinance

Interpersonal expertise overlap within a company

Moviegalaxies

Bitcoin

Neural Network of a Caenorhabditis elegans worm

Airports in the United States

Author Citation Networks

.uk Domain Network

Python Dependency for PyPi

Stanford Large Network Dataset Collection

Coauthorship and Citation Networks

Internet Topology

Stack Overflow

Yelp Data

Youtube dataset

Amazon product copurchasing networks and metadata

Wikipedia

Movie Ratings

Who trusts whom data at Trustlet

Mark Newman's pointers

Reality Commons data

Google Local Dataset

Bitcoin

MOOC Forums Dataset