Assignment A4: Hashing High-Volume Data -- Part II (A)

Part II Overview

Here in Part II, we both implement (A) a custom hash table and (B) apply it in an application. In particular, we start with an image processing program and add a new data-compression feature that uses the process described in Part I.

Perform the implementation described below. Turn in your Java source files and README.txt file using Catalyst CollectIt prior to the Part II deadline of Wednesday, February 10 at 11:59 PM. The standard late policy applies.

II.A. Your Custom Hash Table

Begin with the hash-table starter code. This zip file includes several files. You will be implementing a customized hash table (or multiple types of hash tables, if you do the extra credit options).

CustomHashtables.java, which is where you will add your implementation of a hash table (and a second one, if you choose to do the linear probing option). The API is described further here.
CustomHashInterface.java, which defines the Java interface that your implementation will implement. You don't need to modify this file.
SeparateChain.java, which is a specific implementation of linked lists that you should use as components of your hash table. It actually handles collision resolution for you. If you do the extra credit option on rehashing, then you might need to modify this file. However, you will probably need to study this code closely in order to implement and debug your basic custom hash table.
HashPutResponseItem.java, this is a wrapper class whose instances you should use to return the status of a PUT operation. You should not change this file.
HashGetResponseItem.java, this is a wrapper class whose instances you should use to return the result and status of a GET operation. You should not change this file.
CustomHTTest.java, this is a JUnit test script that you should use as you develop your custom hash table. It will help guide you through the debugging and adding of functionality to your custom hash table. You are welcome to change this file as you develop and debug, but you will not be turning it in.

The most basic hashtable methods are PUT and GET, where PUT takes a key and a value and inserts the value into the hash table, in association with the given key, and where GET takes a given key and returns the value associated with it. In our custom hash table, we are enhancing the functionality of these basic methods. The new methods are called putAndMore and getAndMore. They take the usual arguments (a key and a value for putAndMore; just a key for getAndMore). However, they return special response objects that include performance information along with what one would normally expect.

The Functionality of Your Own Hash Table

In this section of the assignment, you'll implement your own custom hash table class. You are given a framework that provides most of the pieces that you will need. Your hash table will handle collision resolution using separate chaining. That work is actually done for you, because keys that collide are simply inserted into the same linked list as part of the PUT process, and you are given a file SeparateChain.java that takes care of this.

You will create a class called SeparateChainingHashtable<Key, Value> which uses type parameters to achieve generality. The stub of this class can be found in the file CustomHashtables.java. Note that you will be creating an inner class of the top-level class CustomHashtables. This is a design decision that we have taken in order to keep most of the interesting code that you will be developing in one file. The file CustomHashtables.java not only contains the stubs for implementing your methods, but it contains some utility functions that will make your life easier. For example, there is a function myMod that you can use to avoid or fix a bug that is associated with the Java MOD operator ("%"). There is also a primeTest method that is useful if you do the rehashing extra credit option.

Additional details on what to implement can be found in the comments at the top of the file CustomHashtable.java.

Suggested Development Strategy

Create a project in Eclipse that contains the starter code for the custom hash table. Also, edit the project properties so that you can right-click on the test script CustomHTTest.java and run it as a JUnit test script. Even if you haven't implemented any of your methods yet in the custom hash table, the test of the primeTest function should work. The other tests, should all fail or raise errors when you are starting out.

The heart of your custom hash table should be an ArrayList object. The elements of this ArrayList should be instances of SeparateChain<Key,Value>. In your constructor, you should use a loop to populate the ArrayList with these instances. These instances of SeparateChain will be initially empty linked lists. The size of your ArrayList comes from a parameter in the constructor, and you should assign that number of a variable called tableSize of type int. You should also have a method getTableSize() that returns the value of tableSize.

Implement the method putAndMore, so that you can start inserting some data in your hash table. This method should compute the hash value of the key by calling the key's hashCode method. You should then "myMod" this by the tableSize to get an array index. Then access the linked list for that index. Call the method of the linked list to do the rest of the work of inserting (or updating) the item. See the starter code for more details. The result returned must be an object of class HashPutResponseItem.

Implement the method getAndMore. It will be similar in many ways to putAndMore, even though its main purpose is to retrieve the value (if any) associated with a given key in the table.

Work to get the unit tests to turn green, by fully implementing the needed functionality and gradually eliminating bugs.

When all the tests pass, you are ready to move on to section (B), in which you will integrate your custom hash table with an image-processing application program.

II.B. Using Your Hash Table in an Application

(specifications for this portion of the assignment are forthcoming).

Extra credit options

(Option A4E1) (5 points). Implement rehashing in your hashtable. When, in the process of inserting a new key (and its value) into the table, the load factor becomes larger than the rehashingThreshold (which is set by the constructor), the rehashing should happen automatically. All the key value pairs should be transfered into a new version (but the same instance) of the hash table. The new table's size should be the smallest prime number that is at least double the old size. The HashPutResponseItem that is returned should include in its rd field a description of the rehashing: "Rehashed to table size 61".

(Option A4E2) (5 points). Implement a second inner class within CustomHashtables.java with the name LinearProbingHashTable. This kind of hash table should use "open addressing" rather than separate chaining, and it should resolve collisions using linear probing. If you don't implement rehashing for this hash table, then choose an initial size that is large enough to accommodate the data of the following test.

There is one other 5-points extra-credit option, which is in Part II (B).

Turn-in

Turn in your source files through our Catalyst CollectIt dropbox. Part II of this assignment is due Wednesday, Feburary 10 at 11:59 PM. Late penalties apply (see the policies page for details).

Grading Rubric

Part II is worth 70 points. Also, there are three extra credit options in Part II (two are in Part II A, and one is in Part II B).

Details TBA

Version 0.92. Last major update Thursday, February 4 at 1:30 AM. Link to customAPI added Feb. 5 at 4:15 PM. A small but important correction was made to file SeparateChain.java in the starter code It was posted at 11:20 AM, Feb. 4. So be sure you have at least Version 1.1 of the hash-table starter code. Version 1.2.1 fixes bugs in the CustomHTTest.java file and now tests your size() method (Feb. 4 at 11:52 PM).