|
|
|
|
|
Hash table too full à spend a lot of time
looking in buckets |
|
Solution: rehash |
|
make hash table twice the size |
|
for each item in original hash table, hash to
location in bigger table |
|
|
|
|
|
Rehash whenever table is 50% full or more |
|
Just a sequence of inserts |
|
(can be generalized for other operations) |
|
We never get a collision |
|
(once dealing with collisions, we’re in
average-case analysis territory) |
|
Hash table starts as size 2 |
|
|
|
|
How expensive is an insert? |
|
How expensive is a rehash? |
|
When will we need to rehash? |
|
|
|
|
|
How expensive is an insert? |
|
O(1) -
assuming no collisions. Say it’s 1. |
|
How expensive is a rehash? |
|
O(N) -
where N is current size of table.
Say it’s N. |
|
When will we need to rehash? |
|
whenever size of table is power of 2
(1,2,4,8,16,…) |
|
|
|
|
|
Charge the cost of some operations to other
operations |
|
Each operation gives us some “tokens”, which we
can spend on future operations |
|
|
|
|
|
Each insert gives us 3 tokens |
|
1 token for that insert |
|
1 token for rehashing this item the first time |
|
1 token for rehashing another item that already
got rehashed once or more
(since we rehash on double the size, #
of items never hashed = # of items already hashed once or more) |
|
|
|
|
|
|
|
Potential function: more sophisticated tokens |
|
Potential function is a function |
|
When operation costs less, potential goes up |
|
When operation costs more, take from potential |
|
Always positive – or we’re taking too long |
|
|
|
|
|
potential function at operation #i
P(i)= 1 + 2F - S/2 |
|
F = # of filled (non-empty) hash table slots |
|
S = total # of hash table slots |
|
Actual cost of operation of operation #i
C(i) |
|
Amortized cost of operation #(i+1):
CA(i+1)=C(i+1)
+ ( P(i+1) – P(i) ) |
|
|
|
|
|
Beginning: |
|
F=0, S=2.
P(i)=0 |
|
For insert with no rehashing |
|
C(i+1)=1 |
|
P(i+1)-P(i)=2 (added non-empty slot) |
|
For insert with rehashing of N items |
|
C(i+1)=1+N |
|
P(i+1)-P(i)=2-N (before, F=N, S=2N. After, S=4N) |
|
|
|
|
|
P(i) is always positive. |
|
Yes. We
rehash when F ³ 1/2S, at which point, 4F=S. Etc… |
|
Amortized cost is constant |
|
CA(i+1)=C(i+1)+P(i+1)-P(i)
=
3, in both cases (previous slide) |
|