Assignment 4 ("Hashing, et cetera")
Due Monday, November 10, Part I in class, and Part II at 11:45 PM via Catalyst CollectIt.
CSE 373: Data Structures and Algorithms
The University of Washington, Seattle, Autumn 2008
Part I (50 points). Turn this part in as hardcopy in class on Monday, Nov. 10.
1. Suppose a B-tree (following the definition on page 147 of Weiss) of height 4 has M=15 and L=4. What are the fewest and greatest numbers of data records permissible in this tree? Show the expressions you use to obtain these numbers. (8 points)
2. Do problem 4.46 on page 164 of Weiss. You may use either pseudocode or Java to express your answer. Comments are encouraged so that it is clear how your method is supposed to work. (8 points)
3. A DNA sequence is a string of letters from the set {A, C, G, T}. Make up a hashing function that would be a good one for DNA sequences. Describe it with a mathematical formula, if possible. However, if you cannot describe it with a mathematical formula, give pseudocode or a Java description of how to compute the function. (8 points)
4. A hashing scheme with separate chaining uses a table of 256 elements. Suppose that 100 elements have been inserted. What is the load factor? If all 100 elements are in the same bin, what can you say about the expected time to insert a new elements into the table? What about finding an element? (Describe each possible case.) (8 points)
5. Consider the hash function f(k) = k mod 10. Starting with an initially empty table, insert the following keys into the table resolving any collisions with quadratic probing. Show clearly where each key goes. 239, 799, 804, 409, 553, 673, 841. (6 points)
6. Draw a picture of a hash table having 11 locations, in which the following locations are occupied: 0, 1, 5, 8, and 10. We assume that the others are unoccupied. Now suppose we are resolving collisions using linear probing and that some element x is hashed by the hash function h and ends up (after possible conflict resolution) in location 2. What can we say about the value of h(x)? (I.e., before conflict resolution, if any). And what if we are resolving collisions using quadratic probing, instead? (6 points)
7. Starting with an empty binary heap, insert the following elements, one at a time, and show the resulting binary heap (as a tree, not an array). 11, 13, 2, 15, 7, 6, 9, 16, 4, 10, 8, 5, 12, 14, 3. (6 points)
Part II (50 points). Turn this part in electronically by Monday night, Nov. 10, at 11:45 PM.
Choose either (a) hashing with separate chaining or (b) probing hash table with linear and quadratic probing. Then implement a demonstration of your technique within the Visual Data Structure Applet framework. Name your new Java file VisibleHashingApplet.java. Note that separate chaining is simpler with regard to handling collisions but requires a more complex display method.
Features that your program should have:
  • Display of the hash table, showing all inserted items (keys and their associated data items) in their proper places.
  • Support for a "NEW" command that takes an argument n, the size of the hash table and sets up a hash table with that many locations:
    NEW 19
    
  • Handle PUT, GET, and REMOVE. (If you are doing a probing hash table, you may use lazy deletion. Lazy deletion means marking an entry as deleted without actually reclaiming the space it takes.)
  • The PUT command should take two string arguments separated by a space. The first is the key, and the second is the "data". If an item with the same key and data is already in the hash table, then the PUT command should do nothing. If an item with the same key but different data is in the hash table, then the new data should replace the old data for that item. After a PUT command is performed, the report should indicate either "succeeded" (for insertion of a new key and data), "replacement succeeded" (for new data for an old key), or "no-op" for the case where the same key and data are both there already.
  • The GET command should take one string argument representing the key. If there is a data item for that key in the hash table, the data should be returned (printed in the status line and history window). If there is no entry for the given key, then "null" should be returned (printed in the status line and history window).
  • The REMOVE command should take one string argument representing the key. If there is an entry for that key, it should be deleted. Otherwise, it should do nothing. If the removal did something, then "succeeded" should be reported. If there was no entry for the given key, then "no-op" should be reported.
  • Keep track of the load factor and the pseudo load factor (that includes lazily deleted items in the load count). In the status and history, report the performance statistics for each operation. e.g., for a probing hash table, report the number of links traversed to handle the operation. For a PUT into a probing hash table, report the number of collisions. In a GET or REMOVE, report how many locations had to be tried to find the item.
  • For a probing hash table, support the commands:
    LINEAR
    QUADRATIC
    
    You can assume these are only called when the hash table is empty. After they are called, collision handling follows the policy given.
The following sequence of commands would lead to the results similar to those shown after. Note that the actual load factor and number of collisions will depend on the hash function that you use.
NEW 10
LINEAR
PUT apple red
PUT banana yellow
PUT pepper red
PUT pepper green
REMOVE banana
REMOVE pickle
GET apple
GET banana
GET pepper

; PUT apple red succeeded (load factor is 0.1, 0 collisions)
; PUT banana yellow succeeded (load factor is 0.2, 0 collisions)
; PUT pepper red succeeded (load factor is 0.3, 1 collisions)
; PUT pepper green replacement succeeded (load factor is 0.3, 1 collisions)
; REMOVE banana succeeded (load factor is 0.3, 0 collisions)
; REMOVE pickle no-op (load factor is 0.3, 1 collisions)
; GET apple -> red (0 collisions)
; GET banana -> null (0 collisions)
; GET pepper -> green (1 collisions)
If you are implementing linear and quadratic probing, you should be able to handle the above sequence as Test Run 1 and the following sequence as test run 2. If you are implementing separate chaining, then you should also handle these sequences but your program can ignore the LINEAR and QUADRATIC commands.
NEW 13
QUADRATIC
PUT Mercury 4880 km
PUT Venus 12103 km
PUT Earth 12756 km
PUT Mars 6794 km
PUT Sun 1390000 km
PUT Moon 3476 km
PUT Jupiter 142984 km
PUT Saturn 120536 km
PUT Uranus 51118 km
PUT Neptune 49532 km
PUT Pluto 2274 km
REMOVE Sun
REMOVE Moon
GET Mercury
GET Jupiter
PUT Ceres 950 km
GET Ceres
GET Sun
GET Pluto
REMOVE Pluto
REMOVE Ceres
(Data courtesy of http://www.nineplanets.org/)
 
In addition to these two test runs, test out your program on a third example: As part of your turn-in, you'll provide a file MY-HASH-EXAMPLE.txt that illustrates your hash table with data from some other realm, such as your hobby. There should be at least 15 commands in the file. When you turn in your program via Catalyst CollectIt, there should be two files:
 
VisibleHashingApplet.java and MY-HASH-EXAMPLE.txt.
Last updated Oct. 31, 2008 at 2:00 PM.