Projection via Hashing

SELECT DISTINCT

R.sid, R.bid

FROM Reserves R

Partitioning phase: Read R using one input buffer. For each tuple, discard unwanted fields, apply hash function h1 to choose one of B-1 output buffers.
- Result is B-1 partitions (of tuples with no unwanted fields).
- 2 tuples from different partitions guaranteed to be distinct.

Duplicate elimination phase: For each partition, read it and build an in-memory hash table, using hash fn h2 (<> h1) on all fields, while discarding duplicates.
- If partition does not fit in memory, can apply hash-based projection algorithm recursively to this partition.

Cost: For partitioning, read R, write out each tuple, but with fewer fields. This is read in next phase.

Previous slide Next slide Back to first slide View graphic version