In this homework, you will implement a hash dictionary (also known as a hash map) and a hash set. We will be using these two data structures extensively in the next project.
This entire project is due on Wednesday, July 31 at 11:59pm.
You will use these files from your prior assignments
src/main/java/datastructures/dictionaries/ArrayDictionary.java
src/main/java/datastructures/lists/DoubleLinkedList.java
If you have chosen a new partner for this assignment, choose either of your submissions from HW2 and verify that these are functioning properly.
You will be modifying the following files:
src/main/java/datastructures/dictionaries/ArrayDictionary.java
src/main/java/datastructures/dictionaries/ChainedHashDictionary.java
src/main/java/datastructures/sets/ChainedHashSet.java
Additionally, here are a few more files that you might want to review while completing the assignment (note that this is just a starting point, not necessarily an exhaustive list):
src/test/java/datastructures/dictionaries/BaseTestDictionary.java
src/test/java/datastructures/dictionaries/TestChainedHashDictionary
src/test/java/datastructures/sets/TestChainedHashSet.java
src/main/java/datastructures/dictionaries/IDictionary.java
src/main/java/datastructures/sets/ISet.java
src/main/java/analysis/experiments/*
Here's another video overview. Note: this video is from 19wi, so some info in this video may be a little outdated.
Here are some baseline expectations we expect you to meet in all projects:
Follow the course collaboration policies
DO NOT use any classes from
java.util.*
. There are only two exceptions to this rule:
You may import and use the following classes:
java.util.Iterator
java.util.NoSuchElementException
java.util.Objects
java.util.Arrays
You may import and use anything from java.util.*
within your testing code.
DO NOT make modifications to instructor-provided code (unless told otherwise). If you need to temporarily change our code for debugging, make sure to change it back afterwards.
Clone the starter code from GitLab and open the project in your IDE. See the instructions from Project 0 if you need a reminder on how to do this.
Copy your DoubleLinkedList.java
and ArrayDictionary.java
files from Project 1 to this new one.
Copy your DoubleLinkedList
delete
tests from Project 1 and paste
them directly into TestDoubleLinkedList.java
.
Next make sure everything works.
Try running SanityCheck.java
, and try running Checkstyle.
Checkstyle should still report the same 5 errors with SanityCheck.java
as it did with Project 0.
Try running TestDoubleLinkedList
and TestArrayDictionary
, and make sure the
tests still pass.
ArrayDictionary
constructor
In order to run one of the upcoming experiments, you will add an extra constructor
to the existing ArrayDictionary
class. This constructor will be used when you
implement ChainedHashDictionary
in the next part. The constructor should take in an integer
representing the initial capacity of the pairs
array.
Below is the constructor stub you should implement:
public ArrayDictionary(int initialCapacity) {
pairs = makeArrayOfPairs(initialCapacity);
// ... initialize any extra fields you made as necessary
}
Tip: to make sure this constructor is working, we can refactor our code to always use this new constructor. Try replacing your existing 0-argument constructor with the following code that will call your new constructor:
private static final int DEFAULT_INITIAL_CAPACITY = /* ... some value */;
// feel free to reuse what value you were using originally here
public ArrayDictionary() {
this(DEFAULT_INITIAL_CAPACITY);
}
Afterwards, make sure the tests in TestArrayDictionary
still pass.
ChainedHashDictionary
Task: Complete the ChainedHashDictionary
class.
In this task, you will implement a hash table that uses separate chaining as its collision resolution strategy.
Correctly implementing your iterator will be tricky—don't leave it to the last minute!
Try to finish the other methods in ChainedHashDictionary
as soon as possible so
you can move on to implementing iterator()
.
In the class when we covered separate chaining hash tables we used LinkedList
as
the chaining data structure. In this task, instead of LinkedList
, you will use your
ArrayDictionary
(from Project 1) as the chaining data structure.
When you first create your chains
array, it will contain null pointers. As
key-value pairs are inserted in the table, you need to create the chains
(ArrayDictionary
s) as required. Let's say you created an array of size 5 (you can
create array of any size), and you inserted the key-value pair ("a", 11).
Dictionary<String, Integer> map = new ChainedHashDictionary<>();
map.put("a", 11);
Your hash table should something like the following figure. In this example, the key "a"
lands in index 2, but if might be in a different index depending on your table size. Also, in
this example, ArrayDictionary
(chain) is of size 3, but you can choose a different
size for your ArrayDictionary
.
Now, suppose you inserted a few more keys:
map.put("f", 13);
map.put("c", 12);
Your internal hash table should now look like the figure below. In this example, keys "a" and "f" both hash to the same index (2).
Notes:
resizingLoadFactorThreshold
: if the ratio of items to buckets exceeds
this, you should resize
initialChainCount
: how many chains/buckets there are initiallychainInitialCapacity
: the initial capacity of each
ArrayDictionary
inner chain
ArrayDictionary
for your internal chains/buckets.
ArrayDictionary
, be sure to use your new
ArrayDictionary
constructor to correct set its initial capacity.
ChainedHashDictionary
receives a null key, use a hashcode of 0 for
that key.
ChainedHashDictionary
.
ChainedHashDictionary
as soon as possible so you can move on to implementing iterator()
.
hashCode()
that Java provides for all classes: so to get the hash of
a key, use keyHash = key.hashCode()
. This method returns an integer, which
can be negative or greater than chains.length
. How would you handle this?
ArrayDictionary
, you just copied over item from the old array to the new
one. Here, how would you move items from one hash table to another?
ChainedHashDictionaryIterator
Restrictions, assumptions, etc.:
.iterator()
method on each IDictionary
inside your chains
array, however, as instantiating an iterator from an existing data structure is both
low cost in space and time.
ChainedHashDictionary
) while you iterate over it. For example,
the following will never happen:
Iterator<KVPair<String,Integer>> itr = dictionary.iterator();
itr.next();
dictionary.put("hi", "373"); // this line will never happen and you can ignore this case
itr.next();
Note that there are some tests that do something that looks similar but is different:
they modify the dictionary in between creating new iterator objects, which
is allowed behavior—it's okay to modify your data structure and
then loop over it again, as long as you do not modify it while looping over it.
Tips for planning your implementation:
Before you write any code, try designing an algorithm using pencil and paper and run
through a few examples by hand. This means you should draw the chains
array
that has some varying number of ArrayDictionary
objects scattered
throughout, and you should try simulate what your algorithm does.
Try to come up with some invariants for your code. These invariants must always be true after the constructor finishes, and must always be true both before and after you call any method in your class.
Having good invariants will greatly simplify the code you need to write, since they reduce the number of cases you need to consider while writing code. For example, if you decide that some field should never be null and write your code to ensure that it always gets updated to be non-null before the method terminates, you'll never need to do null checks for that field at the start of your methods.
As another example, it's possible to pose the DoubleLinkedList
iterator's implementation in terms of invariants:
next
field is always
non-null and contains the next node to output.next
field is null.Additional notes:
ChainedHashDictionary
code
and add additional invariants there to reduce the number of cases for the
chains
array.ChainedHashSet
Task: Complete the ChainedHashSet
class.
In section c, you implemented the dictionary ADT with a hash table. You can also implement
the set ADT using hash tables. Recall that sets store only a key, not a key-value pair. In
sets, the primary operation of interest is contains(key)
, which returns
whether a key is part of the set. Hash tables provide an efficient implementation for this
operation.
Notes:
ChainedHashDictionary<KeyType, Boolean>
to store items (keys) in your
ChainedHashSet
. Since there are no key-values pairs (only keys) in sets,
we will completely ignore the values in your dictionary: use a placeholder boolean
whenever necessary.
In this homework assignment, we won't be grading the tests that you write. But, to be thorough and to foster good habits, we strongly encourage you to write additional tests for your code, since they may help you spot different bugs or make you more confident in the correctness of your implementation. (Also remember that we have "secret" tests that you will be graded on in addition to the provided tests, so it's in your best interest to test more cases.)
For this assignment, you should focus in particular on edge cases. Whenever you see conditional logic in your code (if statements, loop conditions, etc.) you should consider writing a test to check its edge cases. To make sure your tests are truly comprehensive, you should make sure that your test cases end up running every possible conditional branch in your code.
Although you can see the inner workings of your own code, it may sometimes be difficult to write code to check the actual state of your data structures by calling their public methods. Instead, you can test the state of your data structures by accessing your private fields directly; see the blue box below for more details.
Task: Complete a write-up containing answers to the following questions.
You and your partner will work together on this write-up: your completed write-up MUST be in PDF format. You will submit it to Gradescope. Log into Gradescope with your @uw.edu email address. When you submit, mark which pages correspond to the questions we ask, and afterwards, the partner who uploaded the PDF must add the other partner as a group member on Gradescope. Do not have each member submit individually. A video showing how to do this can be found here. We may deduct points if you do not correctly submit as a group or do not properly assign your pages/problems.
Before we get to the experiments, here are some questions about design decisions you made while doing the programming portion of the assignment.
ChainedHashDictionary
For this first prompt, reflect on a design decision you deliberately made while implementing
your ChainedHashDictionary
. The specifications for this assignment are
deliberately loose, so you should have needed to make some decisions on your own. Consider 1
such decision you had to make, and answer the following questions:
Your responses will be graded partially based on effort, and partially based on their clarity and how well they described your design decision. If you choose a design decision that had an obviously-best solution, or a problem in which you arbitrarily decided on a final solution, you may lose points.
ChainedHashDictionaryIterator
For this prompt, briefly describe the invariant(s) you chose for your
ChainedHashDictionary
iterator. Did you find them useful while implementing
the iterator? Also note down any invariants you discarded for any reason (e.g., they
were too inefficient/impossible to enforce, or they simply weren't useful).
This section will be graded based on completion.
For each of the experiments, answer the bolded questions (and questions in the orange boxes) in your write-up. Just like before, a plot will automatically be generated to display the results of the experiments; include PNGs of the plots inside your write-up PDF.
The hypothesis/predict-based-on-the-code portions of your write-up will be graded based on completion, so just write down your honest thoughts before running the experiment. The post-experiment analysis portions will be graded based on the clarity of your explanations and whether they match up with your plot.
hashCode
s vs. AVL trees
This experiment explores how different hash code functions affect the runtime of a
ChainedHashDictionary
, and compare that to the runtime of an
AVLTreeDictionary
.
First, we’ll look at the tests involving the ChainedHashDictionary
:
test1
, test2
, and test3
. Each uses a different class
(FakeString1
, FakeString2
, and FakeString3
respectively) as keys for a ChainedHashDictionary
. Each of the different fake
string objects represent a string (by storing an array of chars) but each class has
different implementations of the hashCode
method. Read over these
implementations of hashCode
and take a look at the corresponding histograms
(each plot shows the distributions of outputs of the hashCode
methods
across 80,000 randomly-generated fake strings).
hashCode
histogramsNow, predict which test method will have the fastest and slowest asymptotic runtime growths.
This experiment tests the runtime for ChainedHashDictionary
's put
method across different values of the load factor threshold for resizing.
First, answer the following prompts:
test1
and
test2
.This experiment tests the runtime for inserting elements into
ChainedHashDictionary
with different initial capacities for the internal
ArrayDictionary
chains.
Briefly describe the differences between the three tests.
This last experiment will estimate the amount of memory used by
DoubleLinkedList
, ArrayDictionary
, and
AVLTreeDictionary
as they grow in size. Predict the complexity class
(constant, logarithmic, linear, \(n\log(n)\), quadratic, exponential)
of memory usage for each of the 3 data structures the as its size increases.
Note: You may get the following warnings when the experiment; just ignore them:
WARNING: Unable to get Instrumentation. Dynamic Attach failed. You may add this JAR as -javaagent manually, or supply -Djdk.attach.allowAttachSelf
WARNING: Unable to attach Serviceability Agent. You can try again with escalated privileges. Two options: a) use -Djol.tryWithSudo=true to try with sudo; b) echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
Task: Submit a response to the feedback survey.
After finishing the project, take a couple minutes to complete this individual feedback survey on Canvas. (Each partner needs to submit their own individual response.)
The following deliverables are due on Wednesday, July 31 at 11:59pm.
Before submitting, be sure to double-check that:
Submit by pushing your code to GitLab and submitting your writeup to Gradescope. If you intend to submit late, fill out this late submission form when you submit.