Due Fri, Feb 16 at 11:30pm
In this part of the assignment, you will implement data structures and sorting algorithms you will be using on the second and third parts, and will practice writing more jUnit tests.
Download the project from this link and open it in your IDE. See the instructions from project 1 if you need a reminder on how to do this.
Copy your DoubleLinkedList.java
,
ArrayDictionary.java
, ChainedHashDictionary.java
, and
ChainedHashSet.java
files from project 2 to this
new one.
Copy any extra tests you wrote. Be sure not to accidentally override the existing ones.
Finally, make sure everything works.
Try running TestDoubleLinkedList
and make sure the tests run.
Try running SanityCheck.java
, and try running checkstyle.
Checkstyle should still report the same 5 errors with SanityCheck.java
as it did with project 1.
Task: add tests for ArrayHeap
and
Searcher.topKSort(...)
to TestArrayHeapFunctionality
,
TestTopKSortFunctionality
, and TestSortingStress
.
In parts 1c and 1d, you will be asked to implement an ArrayHeap and the top-\(k\) sorting algorithm.
After you finish reading the specifications for these classes below, start by adding tests for these three classes. If you look through the source code of these three test classes, you'll notice that they're almost completely empty! It's now your responsibility to write comprehensive tests for these two classes.
We will grade the quality of your tests by running them against many
different buggy versions of ArrayHeap
and Searcher.topKSort(...)
.
In order to help us do so efficiently, we ask that you...
Keep all tests in TestArrayHeapFunctionality
and
TestTopKSortFunctionality
short. Every test you add
to these two files should have a timeout of a second or less.
(You can add as many tests as you want to these two files, however).
As a general rule of thumb, any tests in these two files should be manipulating heaps or lists that at most contain several hundred items.
Basically, we plan on running the tests you add in
TestArrayHeapFunctionality
multiple times, and don't want them
to take forever to run.
Add any expensive stress tests (tests that manipulate a large amount of
data or take a long time to complete) to the TestSortingStress
file. This file can contain stress tests for both ArrayHeap
and Searcher.topKSort(...)
.
As a general rule of thumb, any tests in this file should be manipulating heaps or lists containing roughly several hundred thousand items.
As a note, we plan on running the tests you add to this file on functionally correct but grossly inefficient versions of ArrayHeap and Searcher.topKSort.
Notes:
As a reminder, you may use classes from
java.util.*
when writing unit tests. You may find
Collections.sort(...) to be particularly
helpful: you can use it to help you automatically determine if
your output is in the correct order. You may need to write some
extra logic to convert between an IList
and
java.util.List
, however.
As always, you may not use anything from
java.util.*
for anything outside of your tests,
unless explicitly told otherwise.
Be sure to test your classes both on very large amounts of data and very small amounts of data.
Be sure to test to make sure you throw the correct exceptions if the user uses your classes incorrectly!
Task: complete the ArrayHeap
class.
Notes:
In lecture, we covered how to implement and work with 2-heaps: a heap where each node has up to 2 children.
For this assignment, you must implement a 4-heap: a heap where each node has up to four children. You will need to modify the equations you were given in lecture in order to successfully do so.
Make sure you're implementing a min-heap, not a max-heap.
Unlike all the other data structures we've implemented so far, your heap will not attempt to handle null entries. (There's no sensible way to compare a null against a non-null element).
Your heap should throw an exception when given a null element –
see the IPriorityQueue
interface for more details.
In lecture, we assumed that every element has some numerical priority, and focused on comparing each element by that priority.
In practice, the exact "priority" an element has is an implementation detail of the items the client gives us, and isn't something we can see directly.
What you should do instead is to compare each item using their
.compareTo(...)
method. The ArrayHeap
is
specifically designed so that all items MUST implement the
Comparable
interface – if they don't, the
code won't compile.
Task: complete the Searcher.topKSort(...)
method.
Your Searcher.topKSort(k, list)
method is responsible for
returning the top \(k\) elements from a list containing \(n\) comparable
elements. You can find the Searcher
class inside the
misc
package.
One naive way of implementing this algorithm would be to implement some sorting algorithm such as quicksort or mergesort, sort the list then return the last \(k\) elements.
However, this would take \(\mathcal{O}(n\log(n))\) time to run! We can do better: your implementation must run in \(\mathcal{O}(n\log(k))\) time.
Notes:
Make sure your topKSort
method works with both
small and large input lists!
Hint 1: your method should construct exactly two data structures:
a single ArrayHeap
, and a single DoubleLinkedList
to store the output to return.
Hint 2: you should need to loop over your input list exactly once.
Task: complete a writeup containing answers to the following questions.
IMPORTANT NOTE: Your writeup is NOT due with the rest of part 1. You may submit along with your code for part 2 and 3 of this assignment.
However, we strongly recommend you at least start on your group writeup before submitting part 1. The questions on this writeup ask you to consider certain aspects of your heap implementation (as well as your ChainedHashDictionary from project 2). This means that starting on your writeup now is a good way of helping you reason through and validate certain design decisions we may check after the final submission.
Note: your completed writeup MUST be in PDF format. You will not submit this writeup
via Canvas – instead, add it to your src/writeup
folder. (That way,
when you zip and submit the src
folder at the end of the project, we'll be
able to look at your writeup along with your source code).
Your writeup will be graded partially on whether or not you produced a reasonable answer, and partially based on the clarity of your explanations and analysis.
In lecture, we gave you formulas for finding the parent, left child, and right child within the array when implementing a binary heap.
For this assignment, you needed to implement a 4-heap – a heap where each child had four equations. This meant you needed to create two equations: a formula \(\text{parent}(i)\) to find the parent of some node \(i\), and a formula \(\text{child}(i, j)\) to find the \(j\)-th child of some node \(i\).
What were those formula?
(To get full credit on this question, you just need to correctly define \(\text{parent}(i)\) and \(\text{child}(i, j)\). You do not need to justify your definitions.)
When implementing the percolateDown
algorithm in
ArrayHeap
, you probably noticed that manually checking each of
the four children was tedious and redundant.
How did you refactor this redundancy? Were there any challenges you ran into along the way? If so, how did you handle those challenges?
Justify the design decisions you made. (Your answer should be at most 1 to 2 paragraphs long).
In this question, you will run a variety of experiments and report back on
the resulting behavior. You can find the experiments in the
analysis.experiments
package. For each of the three experiments,
answer the following:
Briefly, in one or two sentences, summarize what this experiment is testing. (As before, treat this as an exercise in reverse-engineering unknown code).
Predict what you think the outcome of running the experiment will be.
Your answer should be at most one or two paragraphs. There is no right or wrong answer here, as long as you thoughtfully explain the reasoning behind your hypothesis.
Run the experiment code. Each Experiment*.java
file should
generate a CSV file in the experimentdata
folder. Import the
CSV file into Microsoft Excel, Google Sheets, or any other analysis of your
choice and generate a plot of the data. Include this plot as a image
inside your writeup.
If the CSV file contains multiple result columns, plot them all in the same chart.
You will be graded based on whether you produce a reasonable plot. In particular, be sure to give your plot a title, label your axes (with units), include a legend if appropriate, and relabel the columns to something more descriptive then "TestResult".
How do your results compare with your hypothesis? Why do you think the results are what they were?
The following deliverables are due on Fri, Feb 16 at 11:30pm.
A single person from your partnership should
submit a zip file conaining the entire contents of your src
folder
on Canvas.
Before submitting, be sure to double-check and make sure that...
TestArrayHeapFunctionality
, TestTopKSortFunctionality
,
and TestSortingStress
classes are completed.ArrayHeap
class and your Searcher.topKSort(...)
method are both fully implement and passes all of your tests.NOTE: Part 1e (the group writeup) is not due with the rest of part 1. You may submit part 1e next week along with parts 2 and 3.