HW5 Part 1 - CSE 373, Spring 2019

Overview

Due Thursday, May 16 at 11:59pm.

Submit by pushing your code to GitLab. If you intend to submit late, fill out this late submission form when you submit.

In this part of the assignment, will practice writing more JUnit tests, and you will implement data structures and sorting algorithms that you will be using in the second part.

You will use these files from your prior assignments:

src/main/java/datastructures/concrete/dictionaries/ArrayDictionary.java
src/main/java/datastructures/concrete/dictionaries/ChainedHashDictionary.java
src/main/java/datastructures/concrete/ChainedHashSet.java
src/main/java/datastructures/concrete/DoubleLinkedList.java

If you have chosen a new partner for this assignment, choose either of your submissions from prior assignments and verify that these are functioning properly.

You will be modifying the following files:

src/main/java/datastructures/concrete/ArrayHeap.java
src/main/java/misc/Sorter.java
src/test/java/datastructures/TestArrayHeap.java
src/test/java/misc/TestSorter.java

Part 1a: Set up your project and fix your data structures

Task: Set up HW5 in your IDE and make sure all the tests for double linked list, hash dictionary, and hash set pass.

Import your homework repo from GitLab. See the instructions from HW0 if you need a reminder on how to do this.
Copy your DoubleLinkedList.java, ArrayDictionary.java, ChainedHashDictionary.java, and ChainedHashSet.java files from HW4 to this new one.
Copy any extra tests you wrote to TestDoubleLinkedList.java or TestChainedHashDictionary.java
Finally, make sure everything works.

Try running the following tests and make sure all the tests pass.
1. TestDoubleLinkedList.java
2. TestArrayDictionary.java
3. TestChainedHashDictionary.java
4. TestChainedHashSet.java
Try running SanityCheck.java, and try running checkstyle. Checkstyle should still report the same 5 errors with SanityCheck.java as it did in the previous projects.

Part 1b: Implement tests for `ArrayHeap` and `Sorter.topKSort(...)`

Task: add tests for ArrayHeap and Sorter.topKSort(...) to TestArrayHeap, and TestSorter.

In parts 1c and 1d, you will be asked to implement an ArrayHeap and the top-\(k\) sorting algorithm.

After you finish reading the specifications for these classes below, start by adding tests for these two classes. If you look through the source code of these two test classes, you'll notice that they're almost completely empty! It's now your responsibility to write comprehensive tests for these two classes. This is a step up from previous assignments, but after working with JUnit tests for each assignment so far, writing these tests shouldn't be anything you can't overcome (with your partner and other course resources to help).

We recommend that you write at least some tests before you implement ArrayHeap and Sorter.topKSort. Doing so will help you better understand what your methods need to do (before you jump into coding), as well as provide a reference to check how well your code is working when you run the tests as you write the code. And of course, there really aren't any downsides to writing more tests, so feel free to add more tests after implementing the code. It's likely that you'll have more ideas of what to test after or as you implement the code, anyways.

Just like we did in Homework 2 part 1's TestDoubleLinkedListDelete.java, we will grade your tests by running them against many different buggy versions of ArrayHeap and Sorter.topKSort(...). In order to help us do so efficiently, we ask that you keep all tests in TestArrayHeap and TestSorter short. Every test you add to these two files should have a timeout of a second or less. (You can add as many tests as you want to these two files, however).

As a general rule of thumb, any tests in these two files should be manipulating heaps or lists that at most contain several hundred items. Basically, we plan on running the tests you add in TestArrayHeap multiple times, and don't want them to take forever to run.

Test-writing advice

Start small: you don't need to come up with every edge case immediately—it will probably be easier to write the general case tests first. For example, make sure that removeMin decreases the size, or that peekMin returns the correct value after inserting a few times.
Cover your bases—make sure every method has at least one test.
Revisit the testing categories from homework 2 part 1b / lecture slides, and try to make tests for any categories that seem applicable to the code in this assignment.
Test that the correct exceptions are thrown in cases defined by the method documentation.
Be sure to test your classes both on large amounts of data and small amounts of data in order to test many possible scenarios and combinations of data structure states and parameters.
When testing your ArrayHeap, you may notice that there's no easy way to iterate through all its data. The only way to verify that your heap works (does the correct percolations to maintain valid heap properties) is by calling removeMin repeatedly and checking that it returns the correct values. Whether your input sequence is random, sorted, reverse-sorted, or anything else, after removeMining, the data should all come out in sorted order. So, one way to test the heap is to make sure each value from removeMin is greater than or equal to the return value of the previous removeMin call (according to compareTo).
You may find that using only removeMin is somewhat restrictive. As an alternative, you can use reflection (see HW4 explanation on testing private fields) to access the internal array inside your ArrayHeap class. The provided TestArrayHeap class has a helper method that will return the internal array given an ArrayHeap. After calling it, you can check whether the heap is valid according to the min-heap properties (each node is 'less than equal to' all of its children, and 'greater than equal to' its parent).

If you want to be more thorough, you can make a helper method to check the heap property at every single index in the heap. This helper method will likely be similar to TestDoubleLinkedList.assertListValidAndMatches from homework 2.

(added 5/12) As a reminder: DO NOT include reflection in tests that will be graded (so be sure to comment them out, place them in separate file, or remove them before your final submissions. Reflection (and really all the methods mentioned other than the first) is tied heavily to your own implementation, meaning that it is unlikely to work with other implementations of the class, including as the ones we use during grading.
To test your Sorter.topKSort, consider the fact that there are two real parameters: K and the input list. To be thorough, consider all the edge cases for K and the input list, and test combinations of both.
As a reminder, you may use classes from java.util.* when writing unit tests. You may find Collections.sort(...) to be particularly helpful for generating expected output. You may need to write some extra logic to convert between an IList and java.util.List, however.

As always, you may not use anything from java.util.* for anything outside of your tests, unless explicitly told otherwise.

Part 1c: Implement `ArrayHeap`

Task: complete the ArrayHeap class.

Notes:

In lecture, we covered how to implement and work with 2-heaps: a heap where each node has up to 2 children.

For this assignment, you must implement a 4-heap: a heap where each node has up to four children. You will need to modify the equations you were given in lecture in order to successfully do so.

Additionally, you must implement this as being indexed from 0, rather than some other arbitrary index.
Make sure you're implementing a min-heap, not a max-heap.
Unlike all the other data structures we've implemented so far, your heap will not attempt to handle null entries. (There's no sensible way to compare a null against a non-null element).

Your heap should throw an exception when given a null element—see the IPriorityQueue interface for more details.
The data structure you'll be implementing has some additional functionality: we'll be able to remove and replace elements in the heap. In exchange, the heap will not support duplicate elements.

We recommend leaving the implementation of these remove and replace methods until after you've implemented add and removeMin. The code for remove and replace will require some changes to add and removeMin, but those changes will be difficult to make before you've implemented the basic code for add and removeMin.

In particular, you'll need to track the indices of each element in the heap by using a dictionary to map from element to index in the heap. (Take a moment to consider why we want to do this, and what the alternative is.) You probably want to declare this dictionary as a field of type IDictionary<T, Integer>. Tracking the indices means that we'll need to update our dictionary of indices every time the heap changes from adding, removing, or percolating; fortunately, adding this logic is relatively painless if you implement add and removeMin using our provided method stubs for the helper methods percolateUp, percolateDown, and swap.

Here are some more tips for each method:
- contains: Since we're tracking the indices of each element in the heap, implementing this is as simple as calling containsKey. Make sure that contains runs in approximately constant time.
- replace: This method replaces an element in the heap with a new element. When this method gets called, you'll need to figure out whether the new item needs to percolate upwards or downwards in the heap, and update the heap accordingly. replace should run in approximately log(n) time.
```
void replace(oldItem, newItem):
    throw exception if necessary
    index = dictionary.remove(oldItem)
    heap[index] = newItem
    dictionary.put(newItem, index)

    percolate(index)

void percolate(index):
    if index is first in heap:
        percolateDown(index)
    else:
        parent = get parent index
        if heap[index] < heap[parent]:
            percolateUp(index)
        else if heap[index] > heap[parent]:
            percolateDown(index)
```
- remove: The code for this method should actually be fairly simple after you've implemented replace, since you can simply reuse its functionality. This method will end up looking somewhat similar to removeMin: you just need to replace the input item with the last item in the heap, and then use the percolation logic from replace to determine how to fix the heap invariants. remove should run approximately log(n) time.
```
void remove(item):
    throw exception if necessary
    index = dictionary.remove(item)

    if index is last in heap:
        heap[index] = null
    else:
        last = get last item in heap
        place last at index
        dictionary.put(last, index)
        heap[last index] = null
        percolate(index)
```
In lecture, we assumed that every element has some numerical priority, and focused on comparing each element by that priority.

In practice, the exact "priority" an element has is an implementation detail of the items the client gives us, and isn't a value we can see directly.

What you should do instead is to compare items using their .compareTo(...) method. The ArrayHeap is specifically designed so that all items MUST implement the Comparable interface—if they don't, the code won't compile.

Review: working with objects that implement Comparable

In Java, the way that we indicate an object is "comparable" in some way is to (a) have that object implement the Comparable interface and (b) have that object implement the compareTo(...) method, which returns an int.

That int indicates how the two elements are "ordered" in comparison to each other. For example, suppose we do a.compareTo(b).

If the returned int is negative, that means a < b.
If the returned int is equal to zero, that means a == b.
If the returned int is positive, that means a > b.

This syntax is unfortunately clunky and a little difficult to remember. Here are two strategies that may help.

The first strategy is to remember that if you ever want to do something like "a OP b" (where "OP" is some comparison operator), you can always automatically translate it to a.compareTo(b) OP 0. For example, if I wanted to check if a <= b, you would write a.compareTo(b) <= 0.

The second strategy is to use helper methods so you can minimize the amount of ugly code you need to write. For example, instead of writing a.compareTo(b) <= 0 everywhere, try writing and using a helper method like:

private boolean leq(T a, T b) {
    return a.compareTo(b) ≤= 0;
}

This will help make your code a little more readable and easier to debug. (Instead of having to worry about whether you used .compareTo(...) correctly in multiple places, you can focus on getting it right just once.)

Part 1d: Implement `Sorter.topKSort(...)`

Task: complete the Sorter.topKSort(...) method.

Your Sorter.topKSort(k, list) method is responsible for returning the top \(k\) elements from a list containing \(n\) comparable elements. You can find the Sorter class inside the misc package.

One naive way of implementing this algorithm would be to implement some sorting algorithm such as quicksort or mergesort, sort the list then return the last \(k\) elements.

However, this would take \(\mathcal{O}(n\log(n))\) time to run! We can do better: your implementation must run in \(\mathcal{O}(n\log(k))\) time.

Notes:

Make sure your topKSort method works with both small and large input lists!
Because you will be putting values from the IList parameter into your heap, your topKSort will not support sorting lists with any duplicate values.
Hint 1: your method should construct exactly two data structures: a single ArrayHeap, and a single DoubleLinkedList to store the output to return.
Hint 2: you should need to loop over your input list exactly once.

Reminder (5/14): there is a secret test that runs when you push, called testIsNLogK. This test counts operations in your code to make sure the runtime is properly n * log(k). Recently, we also added a timeout of 1 second so this test now more accurately mirrors grading (and tests efficiency by counting and timing your code).

Part 1e: Complete group writeup

Task: Complete a writeup containing answers to the following questions.

Your writeup will be graded partially on whether or not you produced a reasonable answer, and partially based on the clarity of your explanations and analysis. You and your partner will work together on this writeup.

Note: your completed writeup MUST be in PDF format. You will not submit this writeup via Canvas—instead, submit it to Gradescope. Log into Gradescope with your @uw.edu email address. When you submit, make sure to mark which pages correspond to the questions we ask, and after submission the partner who submitted the assignment must add the other partner. That is, please only turn in one submission per group and make sure both group members are assigned to that submission. A video showing how to do this can be found here.

You will run a variety of experiments and report back on the resulting behavior. Be sure to take some time to read the provided code and understand what's going on.

Note: Some experiments may take up to ~30 min to run.

You can find the experiments in the analysis.experiments package.

Run the experiment code. Each Experiment*.java file should generate a CSV file in the experimentdata folder. We recommend using plot.ly to generate a plot from your CSV, but if you prefer you can also use Microsoft Excel, Google Sheets, or any other analysis tool of your choice. Generate a plot of the data, and include this plot as a image inside your writeup. If the CSV file contains multiple result columns, plot them all in the same chart.

If you're using plot.ly, here are some basic instructions:
1. Click the 'Import' button and load your csv for the experiment data you want to graph.
2. Click on 'Trace' for each time you want to add a new Line to your graph.
3. Select whatever type of graph you want (line, probably?), and also select what data should be the X and Y axes.
4. Repeat adding 'Trace's for each test result.
5. Screenshot the graph when you're done.
You will be graded based on whether you produce a reasonable plot. In particular, be sure to give your plot a title, label your axes (with units), include a legend if appropriate, and relabel the columns to something more descriptive than "TestResult".
You should also notice that in your plots, experiment 1 results in a ~~flat~~ linear shape, but experiment 2 results in a curve. Why are they different and where do those shapes come from? You might find it helpful to reference general knowledge of complexity classes in your response and the axes of the two graphs.

The answer to this question doesn't need to be very long; a couple sentences is enough.

Deliverables

The following deliverables are due on Thursday, May 16 at 11:59pm.

Submit by pushing your code to GitLab. If you intend to submit late, fill out this late submission form when you submit.

Before the deadline, be sure to double-check that...

Your TestArrayHeap and TestSorter classes are completed.
Your ArrayHeap class and your Sorter.topKSort(...) method are both fully implement and passes all of your tests.
You ran the checkstyle plugin within IntelliJ and fixed all style errors.
You did not permanently modify any provided files that you were not supposed to, and you did not change the public interface of any classes.
You and have completed and submitted the group writeup.

CSE 373, Spring 2019: HW5 Part 1