Solution to Homework #4, Question #4

Cranking through the numbers shows that the Emp relation takes 100 pages, and the Dept relation takes 50 pages.

Since there is an index on did of Emp, we can consider an index nested loop join where Dept is the outer relation. The cost of reading Dept is 50 I/Os. For every tuple of Dept we need to do an index retrieval for the appropriate value of the did attribute. Typically, with a hash index this takes either 1 or 2 I/Os. Since the index is clustered, we need only one more I/O to get all the matching tuples.

Hence, the cost of the index nested loop join is:

50 + 5000 * (1.5 + 1)

Consider hash-join. Emp already has a hash index on did, so we need to hash Dept. That would cost about 2x50 (one for reading and one for writing). The join itself would then cost 50+100. If we also needed to hash Emp, (perhaps because we want to use a different hash function) we would pay 200 more.

For sort-merge, we first need to sort both relations. Even if both of them can be sorted in two passes, it would cost 2x150 (though in general, the cost of the sort is N Log (N)). The cost of the merge step is that of scanning both relations, i.e., 150 I/Os.