Amortized Analysis of Rehashing
Hash table too full à spend a lot of time looking in buckets | ||
Solution: rehash | ||
make hash table twice the size | ||
for each item in original hash table, hash to location in bigger table |
Assumptions for
Thursday, Feb. 10, 2000
Rehash whenever table is 50% full or more | ||
Just a sequence of inserts | ||
(can be generalized for other operations) | ||
We never get a collision | ||
(once dealing with collisions, we’re in average-case analysis territory) | ||
Hash table starts as size 2 |
How expensive is an insert? | |
How expensive is a rehash? | |
When will we need to rehash? |
How expensive is an insert? | ||
O(1) - assuming no collisions. Say it’s 1. | ||
How expensive is a rehash? | ||
O(N) - where N is current size of table. Say it’s N. | ||
When will we need to rehash? | ||
whenever size of table is power of 2 (1,2,4,8,16,…) |
Amortized
analysis
Strategy 1: add up operations
Amortized
analysis
Strategy 2: Accounting Method
Charge the cost of some operations to other operations | |
Each operation gives us some “tokens”, which we can spend on future operations |
Each insert gives us 3 tokens | ||
1 token for that insert | ||
1 token for rehashing this item the first time | ||
1 token for rehashing another item that
already got rehashed once or more (since we rehash on double the size, # of items never hashed = # of items already hashed once or more) |
Potential function: more sophisticated tokens | ||
Potential function is a function | ||
When operation costs less, potential goes up | ||
When operation costs more, take from potential | ||
Always positive – or we’re taking too long |
Tokens as a potential function
potential function at operation #i
P(i)= 1 + 2F - S/2 |
||
F = # of filled (non-empty) hash table slots | ||
S = total # of hash table slots | ||
Actual cost of operation of operation
#i C(i) |
||
Amortized cost of operation
#(i+1): CA(i+1)=C(i+1) + ( P(i+1) – P(i) ) |
Beginning: | ||
F=0, S=2. P(i)=0 | ||
For insert with no rehashing | ||
C(i+1)=1 | ||
P(i+1)-P(i)=2 (added non-empty slot) | ||
For insert with rehashing of N items | ||
C(i+1)=1+N | ||
P(i+1)-P(i)=2-N (before, F=N, S=2N. After, S=4N) |
P(i) is always positive. | ||
Yes. We rehash when F ³ 1/2S, at which point, 4F=S. Etc… | ||
Amortized cost is constant | ||
CA(i+1)=C(i+1)+P(i+1)-P(i) = 3, in both cases (previous slide) |