Why fingerprint?
Probability is wrt our choice of a fpr scheme.
- Don’t need assumption about input.
Keys are long or there are no keys (need uid’s):
- In AltaVista 100M urls @ 90 bytes/url = 9GB
100M fprs @ 8 byte/fpr = 0.8GB
- Find duplicate pages -- two pages are the same if they have the same fpr.
-