Cost of Hash-Join
In partitioning phase, read+write both relations; 2(M+N). In matching phase, read both relations; M+N I/Os.
In our running example, this is a total of 4500 I/Os. (45 seconds!)
Sort-Merge Join vs. Hash Join:
- Given a minimum amount of memory both have a cost of 3(M+N) I/Os. Hash Join superior on this count if relation sizes differ greatly. Also, Hash Join shown to be highly parallelizable.
- Sort-Merge less sensitive to data skew; result is sorted.