# CSE 326, Spring 1997: Homework 8

## Due 5/28/97

1. (20 points) The purpose of this problem is to analyze and compare B-trees and extendible hashing as two alternatives for maintaining a large data base. Imagine that you have been asked to to implement a new IRS database. There are 50,000,000 records, each of which take an average of 2,000 bytes. The records are keyed with a Social Security Number which is 4 bytes. The computer that you will be using has 2,048 byte pages and pages are addressed with 4 byte integers. Assume that the computer has 16 MB of memory that is usable for storing all or part of a search structure.
• For the B-tree, disk addresses are used for pointers, and a 2 byte integer is used to keep track of the length of the active area in the B-tree node. The leaves of the B-tree contain pointers to the actual records. We assume that the root is about half full and that the nodes of the B-tree are about 3/4-th full on average.

Calculate the maximum order of the B-tree so that the each node fits into a page. Calculate the height of the B-tree with that order so that all the 50,000,000 records can be accessed. Calculate how many levels of the B-tree can fit into the memory of the computer. Calculate how many disk accesses are necessary to find a specific record using this search structure.

• For extendible hashing, keys are mapped to 4 byte hash values by a one-to-one mapping. So each Social Security Number has a unique hash value. Each page in extendible hashing contains a number of entries, each of which has a complete hash value together with a disk address of an actual record. In addition, each page contains a one byte integer indicating what portion of the complete hash value is relevant to the page. The hash table that fits in main memory has 2^k entries for some k. An index into the hash table is interpreted as a prefix of a complete hash value. An entry in the hash table contains a disk address.

Calculate the maximum number of entries that can fit into one page in extendible hashing. Calculate the maximum size hash table that can fit in main memory. Assuming that the pages in extendible hashing are about 3/4-th full, how many pages are needed to store all the pages, not including the pages needed for the actual data. Given the constraints of the memory available will extendible hashing work for this database? If so, suggest the a suitable hash table size. How many disk accesses per look-up are needed?

• For this application what data structure would you recommend for accessing data in the new IRS database. Justify your answer. (There is no absolute best answer for this question, so you might consider listing the pros and cons for each implementation.)
2. (10 points) Consider an array with the following data in it: 14,2,3,20,15,10,13,6,9.
• Use Floyd's method to build a max-heap from this array, showing the result after insertion each step. The max-heap has the maximum at the root.
• With the result of build heap, complete heapsort, showing the result after each delete_max operation.