Amortized Analysis of Rehashing

What is rehashing
Hash table too full à spend a lot of time looking in buckets
Solution: rehash
make hash table twice the size
for each item in original hash table, hash to location in bigger table

Assumptions for
Thursday, Feb. 10, 2000
Rehash whenever table is 50% full or more
Just a sequence of inserts
(can be generalized for other operations)
We never get a collision
(once dealing with collisions, we’re in average-case analysis territory)
Hash table starts as size 2

Observations
How expensive is an insert?
How expensive is a rehash?
When will we need to rehash?

Observations
How expensive is an insert?
O(1)  - assuming no collisions.  Say it’s 1.
How expensive is a rehash?
O(N)  - where N is current size of table.  Say it’s N.
When will we need to rehash?
whenever size of table is power of 2 (1,2,4,8,16,…)

Amortized analysis
Strategy 1: add up operations

Amortized analysis
Strategy 2: Accounting Method
Charge the cost of some operations to other operations
Each operation gives us some “tokens”, which we can spend on future operations

Accounting Method analysis
Each insert gives us 3 tokens
1 token for that insert
1 token for rehashing this item the first time
1 token for rehashing another item that already got rehashed once or more
(since we rehash on double the size, # of items never hashed = # of items already hashed once or more)

Slide 9

Slide 10

Potential function
Potential function: more sophisticated tokens
Potential function is a function
When operation costs less, potential goes up
When operation costs more, take from potential
Always positive – or we’re taking too long

Tokens as a potential function
potential function at operation #i
P(i)= 1 + 2F - S/2
F = # of filled (non-empty) hash table slots
S = total # of hash table slots
Actual cost of operation of operation #i
C(i)
Amortized cost of operation #(i+1):
CA(i+1)=C(i+1) + ( P(i+1) – P(i) )

The Math
Beginning:
F=0, S=2.  P(i)=0
For insert with no rehashing
C(i+1)=1
P(i+1)-P(i)=2 (added non-empty slot)
For insert with rehashing of N items
C(i+1)=1+N
P(i+1)-P(i)=2-N (before, F=N, S=2N.  After, S=4N)

Potential method analysis
P(i) is always positive.
Yes.  We rehash when F ³ 1/2S, at which point, 4F=S.  Etc…
Amortized cost is constant
CA(i+1)=C(i+1)+P(i+1)-P(i)
= 3, in both cases (previous slide)