key -> hashcode -> compression -> array[index] -> value
O(1) behavior if designed well

worst hashcode of a student in class: 42
bad hashcode of a student in class: Age
better hashcode of a student in class: Bday
best hashcode of a student in class: social security number

A hashcode must satisfy:
o1.equals(o2) == true  =>  o1.hashCode() == o2.hashCode()

Satisfying the converse as much as possible is not required but strongly 
recommended, as it improves the performance.

Data is often similar.  Hashcodes should destroy any 
consistencies/similarities/coincidences.  

Hashcode of x+y for a Point is bad because 
(0,7) -> 7
(1,6) -> 7
(2,5) -> 7
(3,4) -> 7
(4,3) -> 7
(5,2) -> 7
(6,1) -> 7
(7,0) -> 7

37 x + y is much better
(0,7) -> 7
(1,6) -> 43
(2,5) -> 79
(3,4) -> 115
(4,3) -> 151
(5,2) -> 187
(6,1) -> 223
(7,0) -> 259

Now compression.  Suppose our array is size 10 and we use mod 10.
(0,7) -> 7   -> 7
(1,6) -> 43  -> 3
(2,5) -> 79  -> 9
(3,4) -> 115 -> 5
(4,3) -> 151 -> 1
(5,2) -> 187 -> 7
(6,1) -> 223 -> 3
(7,0) -> 259 -> 9

Lots of collision, only the odd numbers being used.

Let's use a prime number instead: 11, with compression mod 11.
(0,7) -> 7   -> 7
(1,6) -> 43  -> 10
(2,5) -> 79  -> 2
(3,4) -> 115 -> 5
(4,3) -> 151 -> 8
(5,2) -> 187 -> 0
(6,1) -> 223 -> 3
(7,0) -> 259 -> 6

Much better.  To avoid clustering and spread things out more, we can do 
(a * i + b) mod N, instead of just i mod N.  Again, N should be prime for 
best performance.

Suppose we had 3 coords... then we could do
37^2 *x + 37 * y + z, or (x*37+y)*37+z.

We can't always eliminate collisions, so we need to have a way to resolve 
them. 

Chaining - in each bin, store a list of items sharing the same key
(worst case O(n) -- all items may be stored in that list)

Open-Addressing - if a bin is occupied, find the next unoccupied bin - may 
be as simple as linear probing or as complicated as double hashing.
(worst case O(n) -- have to go through all indices before finding the 
empty one)

Deleting is a little tricky for open-addressing -- need a placeholder 
marking the spot as once holding something so that items placed after it 
can still be found.

Load Factor -- number of elements stored / size of hashtable
chaining: 75%-100% is good.  
open-addressing: 25%-50% is good.
(if too low, wasted space; if too high, poor performance)

hash this string!
http://www.amazon.com/gp/product/0930410963/102-7078559-5346557?v=glance&n=283155&5Fencoding=UTF8&v=glance

when hash table gets full (from lots of elements or a lot of deleted 
fillers from probing), need to enlarge and rehash -- expensive, though not 
done often.