CSE 373, Spring 2019: HW4 - Hashing with Chaining

Overview

In this homework, you will implement a hash dictionary (also known as a hash map) and a hash set. We will be using these two data structures extensively in the next project.

This entire project is due on Wednesday, May 8 at 11:59pm.

You will use these files from your prior assignments

  • src/main/java/datastructures/concrete/dictionaries/ArrayDictionary.java
  • src/main/java/datastructures/concrete/DoubleLinkedList.java

If you have chosen a new partner for this assignment, choose either of your submissions from HW2 and verify that these are functioning properly.

You will be modifying the following files:

  • src/main/java/datastructures/concrete/dictionaries/ArrayDictionary.java
  • src/main/java/datastructures/concrete/dictionaries/ChainedHashDictionary.java
  • src/main/java/datastructures/concrete/ChainedHashSet.java

Additionally, here are a few more files that you might want to review while completing the assignment (note that this is just a starting point, not necessarily an exhaustive list):

  • src/test/java/datastructures/dictionaries/BaseTestDictionary.java
  • src/test/java/datastructures/dictionaries/TestArrayDictionary.java
  • src/test/java/datastructures/dictionaries/TestChainedHashDictionary
  • src/test/java/datastructures/TestChainedHashSet.java
  • src/main/java/datastructures/interfaces/IDictionary.java
  • src/main/java/datastructures/interfaces/ISet.java
  • src/main/java/analysis/experiments/*

Here's another video overview.

Expectations

Here are some baseline expectations we expect you to meet in all projects:

  • Follow the course collaboration policies

  • DO NOT use any classes from java.util.*. There are only two exceptions to this rule:

    1. You may import and use the following classes:

      • java.util.Iterator
      • java.util.NoSuchElementException
      • java.util.Objects
      • java.util.Arrays
    2. You may import and use anything from java.util.* within your testing code.

  • DO NOT make modifications to instructor-provided code (unless told otherwise). If you need to temporarily change our code for debugging, make sure to change it back afterwards.

Section a: Initial setup

  1. Clone the starter code from GitLab and open the project in your IDE. See the instructions from HW1 if you need a reminder on how to do this.

  2. Copy your DoubleLinkedList.java and ArrayDictionary.java files from HW2 to this new one.

    The copied code will not immediately compile. This is ok: we modified IDictionary so it requires you to implement one additional method.

  3. Copy your DoubleLinkedList delete tests from HW2 and paste them directly into TestDoubleLinkedList.java.

  4. Finally, make sure everything works.

    Try running TestDoubleLinkedList and make sure the tests run.

    Try running SanityCheck.java, and try running Checkstyle. Checkstyle should still report the same 5 errors with SanityCheck.java as it did with HW2.

Section b: Repair ArrayDictionary

Task: Implement ArrayDictionary.iterator().

Take a look at IDictionary. You may notice that the interface now requires all subclasses to implement one additional method: iterator().

By implementing this method, we can now (efficiently) iterate over all key-value pairs in our dictionaries! For example, we can now do this:

IDictionary<String, Integer> foo = makeTheDictionary();
for (KVPair<String, Integer> pair : foo) {
    String key = pair.getKey();
    Integer value = pair.getValue();

    // Do something with key and value
    System.out.println(key + " : " + value);
}

The above program should print out every key-value pair contained within foo.

Notes:

  1. Your iterator() method may return the key value pairs in any order you wish. The easiest approach would likely be to return them in the order your pairs are listed in your internal array.

  2. You may assume the user will not modify the dictionary while you're iterating over it.

  3. You may NOT create any temporary data structures such as a temp IList when implementing your iterator. We want our iterators to be efficient, and having to copy the contents of our dictionary to some other data structure at any point is suboptimal.

    You may, however, create new KVPair objects. (They do not count as "temporary" data structures).

  4. We have provided you with unit tests for your iterators. You can add more tests if you want.

  5. If you need examples on how to implement an iterator, see DoubleLinkedList and the stubs in ChainedHashDictionary. Your iterator will look something like this:

    private static class ArrayDictionaryIterator<K, V> implements Iterator<KVPair<K, V>> {
        // Add any fields you need to store state information
    
        public ArrayDictionaryIterator(/* possible constructor arguments */) {
            // Initialize the iterator
        }
    
        public boolean hasNext() {
            // Implement hasNext
            throw new NotYetImplementedException();
        }
    
        public KVPair<K, V> next() {
            // Return the next KVPair in the dictionary
            throw new NotYetImplementedException();
        }
    }

    Make sure to make the class private and static, and make sure you copy how the generics are being defined correctly.

  6. The IDictionary interface requires that dictionaries return KVPair objects. However, your internal dictionary uses private Pair objects. You will need to convert between the two within your iterator—see the method header comments in KVPair to understand how to instantiate new KVPair objects.

Section c: Implement ChainedHashDictionary

Task: Complete the ChainedHashDictionary class.

In this task, you will implement a hash table that uses separate chaining as its collision resolution strategy.

Notes:

  • One of the constructors for ChainedHashDictionary takes as input a load factor lambda at which to resize; for the other constructor, it's up to you to pick a reasonable default value for lambda.
  • You may implement any resizing strategy covered in lecture.
  • Use ArrayDictionary for your internal chains/buckets.
  • If your ChainedHashDictionary receives a null key, use a hashcode of 0 for that key.
  • Your iterator() method doesn't need to yield the pairs in any particular order.

Correctly implementing your iterator will be tricky—don't leave it to the last minute! Try to finish the other methods in ChainedHashDictionary as soon as possible so you can move on to implementing iterator().

The comments in ChainedHashDictionary will contain a few hints on how to implement this method.

Additional hints

In the class when we covered separate chaining hash tables we used LinkedList as the chaining data structure. In this task, instead of LinkedList, you will use your ArrayDictionary (from HW2) as the chaining data structure.

When you first create your chains array, it will contain null pointers. As key-value pairs are inserted in the table, you need to create the chains (ArrayDictionarys) as required. Let's say you created an array of size 5 (you can create array of any size), and you inserted the key-value pair ("a", 11).

Dictionary<String, Integer> map = new ChainedHashDictionary<>();
map.put("a", 11);

Your hash table should something like the following figure. In this example, the key "a" lands in index 2, but if might be in a different index depending on your table size. Also, in this example, ArrayDictionary (chain) is of size 3, but you can choose a different size for your ArrayDictionary.

ChainedHashDictionary internal state 1

Now, suppose you inserted a few more keys:

map.put("f", 13);
map.put("c", 12);

Your internal hash table should now look like the figure below. In this example, keys "a" and "f" both hash to the same index (2).

ChainedHashDictionary internal state 2

Hash function: Do not try to implement your own hash function. Use the hash method hashCode() that Java provides for all classes: so to get the hash of a key, use keyHash = key.hashCode(). This method returns an integer, which can be negative or greater than chains.length. How would you handle this?

Resizing strategy: Recall that operations on a hash table slow down as the load factor increases. So you need to decide when to resize (increase) your internal array, by how much, and actually do the resizing. When resizing your ArrayDictionary, you just copied over item from the old array to the new one. Here, how would you move items from one hash table to another?

Section d (highly recommended): Implement getOrDefault on your IDictionarys

The IDictionary interface includes a convenient getOrDefault method which will be useful in future assignments. This method either gets the value of a key in the IDictionary or returns the given default value if the key is not inside. IDictionary provides a default implementation of this method that simply calls containsKey; however, this method can be implemented more efficiently inside ArrayDictionary and ChainedHashDictionary, where you have access to the data structures' internals. (You can also see AVLDictionary for an example implementation.) While implementing this method on your data structures is not strictly required, we recommend doing so, since it will make it easier for your to pass any efficiency tests later.

Section e: Implement ChainedHashSet

Task: Complete the ChainedHashSet class.

In section c, you implemented the dictionary ADT with a hash table. You can also implement the set ADT using hash tables. Recall that in sets, the data item to store is just a key, not a key-value pair. In sets, the operation of interest is contains(key), which returns whether a key is part of the set. Hash tables provide an efficient implementation for this operation.

Notes:

  • To avoid code duplication, we will use an internal dictionary of type IDictionary<KeyType, Boolean> to store items (keys) in your HashSet. Since there are no key-values pairs (only keys) in sets, we will completely ignore the values in your dictionary: use a placeholder boolean whenever necessary.
  • Your code for this class should be very simple: your inner dictionary should be doing most of the work.

Section f (highly recommended): Consider more test cases

In this homework assignment, we won't be grading the tests that you write. But, to be thorough and to foster good habits, we strongly encourage you to write additional tests for your code, since they may help you spot different bugs or make you more confident in the correctness of your implementation. (Also remember that we have 'secret' tests that you will be graded on in addition to the provided tests, so it's in your best interest to test more cases.)

For this assignment, you should focus in particular on edge cases. Whenever you see conditional logic in your code (if statements, loop conditions, etc.) you should consider writing a test to check its edge cases. To make sure your tests are truly comprehensive, you should make sure that your test cases end up running every possible conditional branch in your code.

Although you can see the inner workings of your own code, it may sometimes be difficult to write code to check the actual state of your data structures by calling their public methods. Instead, you can test the state of your data structures by accessing your private fields directly; see the blue box below for more details.

Testing private fields

In the previous project, you may have noticed that it can be difficult to verify the correctness of internal fields in your data structures. We want this internal state to be private to maintain the abstraction in our ADTs (and in other classes as well), and in theory, the state of these private fields shouldn't matter as long as the public interface works properly. However, in practice, we often know that a bad internal state will lead to errors later—for example, a bad linked list pointer always has the potential to cause issues—but it may take a lengthy or specific sequence of operations to actually cause that error in a test.

In these cases, we would often rather find some way to directly inspect this private state. There are a few methods of testing internal state, each with pros and cons:

  • Just test internal state indirectly. This is what we did in the previous assignment; in many cases, it's the most natural method, since the public interface is all that matters in the end, but it can be difficult if the private state is complex, as with many data structures.
  • Move your tests so that they can access the internal state, for example, by moving the tests into the class being tested. This is a fairly simple solution, but increases clutter in the code. Additionally, in this class in particular, we need to keep your main code and tests separate for grading purposes.
  • Increase visibility of your internal state, essentially making your state not internal. Again, simple, but it removes abstraction and defeats the purpose of having different access levels in the first place. This also will not work with our test-grading methods.
  • Use reflection. Reflection refers to the act of a program inspecting or changing its own code at runtime. Java's reflection API allows us to access private fields or methods of an object by name, allowing us to bypass their private nature. However, using reflection can make our tests harder to write and read, and doesn't work well with IDEs and compiler type checks (since it only works at runtime).

We recommend using this last method: reflection. However, you need to be careful about where you use it.

  • DO NOT use reflection in tests that will be graded (in future assignments you will have to be careful. For this homework (homework 4), your tests aren't graded so feel free to use reflection in additional tests you write). Reflection (and really all the methods mentioned other than the first) is tied heavily to your own implementation, meaning that it is unlikely to work with other implementations of the class, including as the ones we use during grading.
  • DO NOT use reflection outside of tests. None of the assignments in this class require usage of reflection.

We've included a helper method in BaseTest.java that can return the contents of a private field of an object given the field's name and type. We've also included an example usage of the method in TestChainedHashDictionary.java, since it's not obvious how to pass it a type. You may find it useful if for example you wish to add a test to check that your chains array is resizing properly.

Section g: Complete group writeup

Task: Complete a writeup containing answers to the following questions.

Your writeup will be graded partially on whether or not you produced a reasonable answer, and partially based on the clarity of your explanations and analysis. You and your partner will work together on this writeup.

Note: your completed writeup MUST be in PDF format. You will not submit this writeup via Canvas—instead, submit it to Gradescope. Log into Gradescope with your @uw.edu email address. When you submit, make sure to mark which pages correspond to the questions we ask, and after submission the partner who submitted the assignment must add the other partner. That is, please only turn in one submission per group and make sure both group members are assigned to that submission. A video showing how to do this can be found here.

  1. You will run a variety of experiments and report back on the resulting behavior. Be sure to take some time to read the provided code and understand what's going on.

    Notes:

    1. Some experiments may take up to ~20 min to run, please don't wait until the last minute to run these!
    2. You may get the following warning when running Experiment 2. If you do, you don't need to worry about it. WARNING: Unable to attach Serviceability Agent. You can try again with escalated privileges. Two options: a) use -Djol.tryWithSudo=true to try with sudo; b) echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope

    You can find the experiments in the analysis.experiments package.

    Run the experiment code. Each Experiment*.java file should generate a CSV file in the experimentdata folder. We recommend using plot.ly to generate a plot from your CSV, but if you prefer you can also use Microsoft Excel, Google Sheets, or any other analysis tool of your choice. Generate a plot of the data, and include this plot as a image inside your writeup. If the CSV file contains multiple result columns, plot them all in the same chart.

    If you're using plot.ly, here are some basic instructions:

    1. Click the 'Import' button and load your csv for the experiment data you want to graph.
    2. Click on 'Trace' for each time you want to add a new Line to your graph.
    3. Select whatever type of graph you want (line, probably?), and also select what data should be the X and Y axes.
    4. Repeat adding 'Trace's for each test result.
    5. Screenshot the graph when you're done.

    You will be graded based on whether you produce a reasonable plot. In particular, be sure to give your plot a title, label your axes (with units), include a legend if appropriate, and relabel the columns to something more descriptive than "TestResult".

  2. You should have seen some interesting patterns in the plots you made! In this part of the writeup, we want to hear about why you think these things happened in the lens of what we're doing in lecture.

    The answers to these questions don't need to be very long; a couple sentences for each is enough.

    • Experiment 1:
      1. Describe the differences in the general growth of runtime of Test 1, 2, and 3. What in the code is causing these differences in runtime? If you were to pick a hash function for this scenario, would you pick the one in Test 1, 2, or 3?
      2. The data points for each test are the average of 5 runs. Given this information, what might the “smoothness” of the lines represent? Explain why any of the graphs is dramatically different from others.
    • Experiment 2:
      1. Describe the overall shapes of the graphs. Explain why two of them are similar but one is different.
      2. What are some possible reasons that make the graph for Test 1 different from the graph for Test 3?

Section h: Complete individual writeup

Task: Submit answers to the following questions.

Each group member should answer and submit these questions independently. These are graded credit/no credit, but we still expect some level of detail regarding your experience. Submit your individual writeups by filling out the form on Canvas.

Deliverables

The following deliverables are due on Wednesday, May 8 at 11:59pm.

Before submitting, be sure to double-check and make sure that...

  • Your ArrayDictionary class is fully repaired and passes all tests.
  • Your ChainedHashDictionary class is fully implemented and passes all tests.
  • Your ChainedHashSet class is fully implemented and passes all tests.
  • You ran the Checkstyle plugin within your IDE and fixed all style errors.
  • You and your partner have submitted your group writeup on Gradescope, and are both assigned to the same submission.
  • You have submitted your own individual writeup on Canvas.

Submit by pushing your code to GitLab. If you intend to submit late, fill out this late submission form when you submit.

Both partners should turn in their individual writeup by the deadline via Canvas on this page.