CSE 373, Winter 2019: Homework 5: Part 1

Overview

Due Friday February 22 at 11:59pm.

Submit by pushing your code to GitLab. If you intend to submit late, fill out this late submission form when you submit.

In this part of the assignment, will practice writing more JUnit tests, and you will implement data structures and sorting algorithms that you will be using in the second part.

You will use these files from your prior assignments:

  • src/main/java/datastructures/concrete/dictionaries/ArrayDictionary.java
  • src/main/java/datastructures/concrete/dictionaries/ChainedHashDictionary.java
  • src/main/java/datastructures/concrete/ChainedHashSet.java
  • src/main/java/datastructures/concrete/DoubleLinkedList.java

If you have chosen a new partner for this assignment, choose either of your submissions from prior assignments and verify that these are functioning properly.

You will be modifying the following files:

  • src/main/java/datastructures/concrete/ArrayHeap.java
  • src/main/java/misc/Sorter.java
  • src/test/java/datastructures/sorting/TestArrayHeapFunctionality.java
  • src/test/java/datastructures/sorting/TestArrayHeapAndSorterStress.java
  • src/test/java/datastructures/sorting/TestSorterFunctionality.java

Part 1a: Set up your project and fix your data structures

Task: Set up HW5 in your IDE and make sure all the tests for double linked list, hash dictionary, and hash set pass.

  1. Import your homework repo from GitLab. See the instructions from HW0 if you need a reminder on how to do this.

  2. Copy your DoubleLinkedList.java, ArrayDictionary.java, ChainedHashDictionary.java, and ChainedHashSet.java files from HW4 to this new one.

  3. Copy any extra tests you wrote to TestDoubleLinkedList.java or TestChainedHashDictionary.java

  4. Finally, make sure everything works.

    Try running the following tests and make sure all the tests pass.

    1. TestDoubleLinkedList.java
    2. TestArrayDictionary.java
    3. TestChainedHashDictionary.java
    4. TestChainedHashSet.java

    Try running SanityCheck.java, and try running checkstyle. Checkstyle should still report the same 5 errors with SanityCheck.java as it did with HW2.

Part 1b: Implement tests for ArrayHeap and Sorter.topKSort(...)

Task: add tests for ArrayHeap and Sorter.topKSort(...) to TestArrayHeapFunctionality, TestSorterFunctionality, and TestArrayHeapAndSorterStress.

In parts 1c and 1d, you will be asked to implement an ArrayHeap and the top-\(k\) sorting algorithm.

After you finish reading the specifications for these classes below, start by adding tests for these three classes. If you look through the source code of these three test classes, you'll notice that they're almost completely empty! It's now your responsibility to write comprehensive tests for these two classes.

We will grade the quality of your tests by running them against many different buggy versions of ArrayHeap and Sorter.topKSort(...). In order to help us do so efficiently, we ask that you...

  1. Keep all tests in TestArrayHeapFunctionality and TestSorterFunctionality short. Every test you add to these two files should have a timeout of a second or less. (You can add as many tests as you want to these two files, however).

    As a general rule of thumb, any tests in these two files should be manipulating heaps or lists that at most contain several hundred items.

    Basically, we plan on running the tests you add in TestArrayHeapFunctionality multiple times, and don't want them to take forever to run.

  2. Add any expensive stress tests (tests that manipulate a large amount of data or take a long time to complete) to the TestArrayHeapAndSorterStress file. This file can contain stress tests for both ArrayHeap and Sorter.topKSort(...).

    As a general rule of thumb, any tests in this file should be manipulating heaps or lists containing roughly several hundred thousand items.

    As a note, we plan on running the tests you add to this file on functionally correct but grossly inefficient versions of ArrayHeap and Sorter.topKSort.

Notes:

  1. As a reminder, you may use classes from java.util.* when writing unit tests. You may find Collections.sort(...) to be particularly helpful: you can use it to help you automatically determine if your output is in the correct order. You may need to write some extra logic to convert between an IList and java.util.List, however.

    As always, you may not use anything from java.util.* for anything outside of your tests, unless explicitly told otherwise.

  2. Be sure to test your classes both on very large amounts of data and very small amounts of data.

  3. Be sure to test to make sure you throw the correct exceptions if the user uses your classes incorrectly!

Part 1c: Implement ArrayHeap

Task: complete the ArrayHeap class.

Notes:

  1. In lecture, we covered how to implement and work with 2-heaps: a heap where each node has up to 2 children.

    For this assignment, you must implement a 4-heap: a heap where each node has up to four children. You will need to modify the equations you were given in lecture in order to successfully do so.

    Additionaly, you must implement this as being indexed from 0, rather than some other arbitrary index.

  2. Make sure you're implementing a min-heap, not a max-heap.

  3. Unlike all the other data structures we've implemented so far, your heap will not attempt to handle null entries. (There's no sensible way to compare a null against a non-null element).

    Your heap should throw an exception when given a null element – see the IPriorityQueue interface for more details.

  4. In lecture, we assumed that every element has some numerical priority, and focused on comparing each element by that priority.

    In practice, the exact "priority" an element has is an implementation detail of the items the client gives us, and isn't something we can see directly.

    What you should do instead is to compare each item using their .compareTo(...) method. The ArrayHeap is specifically designed so that all items MUST implement the Comparable interface – if they don't, the code won't compile.

Review: working with objects that implement Comparable

In Java, the way that we indicate an object is "comparable" in some way is to (a) have that object implement the Comparable interface and (b) have that object implement the compareTo(...) method, which returns an int.

That int indicates how the two elements are "ordered" in comparison to each other. For example, suppose we do a.compareTo(b).

  • If the returned int is negative, that means a < b.
  • If the returned int is equal to zero, that means a == b.
  • If the returned int is positive, that means a > b.

This syntax is unfortunately clunky and a little difficult to remember. Here are two strategies that may help.

The first strategy is to remember that if you ever want to do something like "a OP b" (where "OP" is some comparison operator), you can always automatically translate it to a.compareTo(b) OP 0. For example, if I wanted to check if a <= b, you would write a.compareTo(b) <= 0.

The second strategy is to use helper methods so you can minimize the amount of ugly code you need to write. For example, instead of writing a.compareTo(b) <= 0 everywhere, try writing and using a helper method like:

private boolean leq(T a, T b) {
    return a.compareTo(b) ≤= 0;
}

This will help make your code a little more readable and a little easier to debug. (Instead of having to worry about whether you used .compareTo(...) correctly in multiple places, you can focus on getting it right just once.)

Part 1d: Implement Sorter.topKSort(...)

Task: complete the Sorter.topKSort(...) method.

Your Sorter.topKSort(k, list) method is responsible for returning the top \(k\) elements from a list containing \(n\) comparable elements. You can find the Sorter class inside the misc package.

One naive way of implementing this algorithm would be to implement some sorting algorithm such as quicksort or mergesort, sort the list then return the last \(k\) elements.

However, this would take \(\mathcal{O}(n\log(n))\) time to run! We can do better: your implementation must run in \(\mathcal{O}(n\log(k))\) time.

Notes:

  1. Make sure your topKSort method works with both small and large input lists!

  2. Hint 1: your method should construct exactly two data structures: a single ArrayHeap, and a single DoubleLinkedList to store the output to return.

  3. Hint 2: you should need to loop over your input list exactly once.

Part 1e (highly recommended): Implement getOrDefault on your IDictionarys

Task: complete the IDictionary.getOrDefault(...) methods.

The IDictionary interface includes a convenient getOrDefault method, which will be useful in optimizing part 2 of this homework. This method either gets the value of a key in the IDictionary, or returns the given default value if the key is not inside. IDictionary provides a default implementation of this method that simply uses containsKey, however this method can be implemented more efficiently when you have access to the data structures' internals. While implementing this method on your data structures is not technically required, we recommend doing so, since it will make it easier for your to pass any efficiency tests in part 2.

Part 1f: Complete individual writeup

Task: complete an individual writeup containing answers to the following questions.

Note: your completed writeup MUST be in PDF format. You will submit this to canvas with the turn in link available under deliverables.

Your writeup will be graded primarily on producing a thoughtful response to each experiment. Beyond running the experiments (which may take some time to fully complete), the intention is for you to demonstrate a basic understanding of the provided code.

In this question, you will run a variety of experiments and report back on the resulting behavior. You can find the experiments in the analysis.experiments package. For each of the three experiments, you should:

  1. In one or two sentences, summarize what this experiment is testing. (As before, treat this as an exercise in reverse-engineering unknown code).

  2. Run the experiment code. Each Experiment*.java file should generate a CSV file in the experimentdata folder. You should import the CSV file into Microsoft Excel, Google Sheets, or any other analysis of your choice and generate a plot of the data, and include the plots in your write up.

  3. How do your results compare with your original prediction? Why do you think the results are what they were?

Deliverables

The following deliverables are due on Friday February 22 at 11:59pm.

Submit by pushing your code to GitLab. If you intend to submit late, fill out this late submission form when you submit.

Both partners should complete and submit part 1e (the individual writeup) on their own on canvas.

Before the deadline, be sure to double-check and make sure that...

  • Your TestArrayHeapFunctionality, TestSorterFunctionality, and TestArrayHeapAndSorterStress classes are completed.
  • Your ArrayHeap class and your Sorter.topKSort(...) method are both fully implement and passes all of your tests.
  • You ran the checkstyle plugin within Eclipse and fixed all style errors.
  • You did not permanently modify any provided files that you were not supposed to, and you did not change the public interface of any classes.
  • You and your partner have both completed and submitted part 1e (the individual writeup).