CSE 373, Autumn 2002: Homework 5

Due 11/22/2002

For all program or data structure design problems such as the two below you must provide pseudocode (see the manual) and an adequate explanation of the methods. It is often helpful to include small examples demonstrating the methods. Straight pseudocode with no additional documentation is not enough. Your pseudocode can contain English if needed. Each problem should be answered on a separate sheets of paper so they can be graded by different TAs. Your name should be at the top of every page.

(10 points) In this problem you will design algorithms to handle lazy deletion for closed hashing using linear probing. Assume we have an on going process of insertions, deletions, and finds. Design four algorithms where each algorithm is briefly described in pseudocode. Assume the keys are non-negative integers which are hashed into a table T of size HSize. An empty table entry can be indicated using -2. Use -1 in a location to indicate that the entry formerly in the location has been deleted. Your hash table algorithms do not have to resize the hash table.

Find(x,T) which returns the index of where x is stored if it is in the table T and -1 otherwise.
Insert(x,T)which inserts x into the table T with the precondition that x is not currently in the table. Naturally, a location with a -1 in it can be used to place a newly inserted item. Your algorithm should indicate when the hash table T is full and x cannot be inserted.
Delete(x,T) which removes x from the table T using lazy deletion.
Rehash(T) which returns a new table of the same size HSize which contains all the items of the old table, but has no items marked as deleted.

(10 points) Imagine that you have been asked to to implement a new IRS database. There are 100,000,000 records, each of which take an average of 2,000 bytes. The records are keyed with a Social Security Number which is 4 bytes. The computer that you will be using has 2,048 byte pages and pages are addressed with 4 byte integers. Assume that a B-tree node completely as possible fills a page. This may means that leaves will hold may hold more or less keys than internal nodes depending on what data is stored on the leaves. Assume that the computer has 16 MB of memory that is usable for storing all or part of a search structure. For the B-tree, disk addresses are used for pointers, and a 2 byte integer is used to keep track of the length of the active area in the B-tree node. The leaves of the B-tree contain pointers to the actual records. We assume that the nodes, other than the root, of the B-tree are about 3/4-th full on average.

Calculate the maximum number of children (M) that an internal node of the B-tree can have. Calculate the maximum number of entries (L) a leaf of the B-tree can have.
Calculate the height of the B-tree with that order so that all the 100,000,000 records can be accessed. About how many children does the root have.
Calculate how many levels of the B-tree can fit into the memory of the computer. Several full levels and the portion of one level will fit into memory.
Calculate how many disk accesses in the worst case are necessary to find a specific record using this search structure. How many on average?

(10 points) A d-heap is like a binary heap except that each node has d children instead of 2. For a number of reasons a 3-heap can perform better than a binary heap.

Assuming a d-heap is stored in an array indexed 1 to Max, give a formulas for computing the index of the d children of the node at index i. Give a formula for computing the parent of a node at index i.
Design PercolateDown(i : integer, x : keyvalue) for the 3-heap.
Design PercolateUp(i : integer, x: keyvalue) for the 3 heap.