Heap
Composing and testing a complex data structure from scratch, again.1
Learning Goals
Integrate multiple ADTs and data structures to solve a problem with complex requirements.
Due to the specific runtime requirements of this heap implementation, our solution will need to creatively combine different ADTs and data structures to design a better solution. Source code for heaps is widely available on the internet (which you’re welcome to reference so long as they’re cited in comments), but what’s tricky is learning how to identify, select, and integrate ideas from different sources.
Plan an implementation and its invariants at multiple levels of abstraction.
When designing and developing novel data structures, it’s important to learn how to plan appropriately.
At a high level, it’s necessary to determine the invariants that a data structure should maintain before and after every method call, and at a low level, it’s useful to determine how these invariants affect the shared functionality that can be extracted into helper methods. In the process, it may also be necessary to consider which of the invariants these helper methods themselves require in order to function correctly.
Table of contents
Getting the Assignment
- Task
- Pull the skeleton repository in IntelliJ to get the
heap
assignment.
Like previous projects, if IntelliJ doesn’t react to the new folder for whatever reason, refresh Gradle manually through the Gradle tool window (on the right by default):
ExtrinsicMinPQ
API
- Background
- Read through the description of the interface you’ll be implementing.
The main task for this assignment will be to develop an array-based implementation of the ExtrinsicMinPQ
interface. The priority for each item will be extrinsic to the object—that is, rather than relying on some sort of comparison function to decide which item has less priority than another, we manually assign priorities by passing in numbers along with the items.
Signature | Description |
---|---|
void add(T item, double priority) | Adds an item with the given priority value. |
boolean contains(T item) | Returns true if the PQ contains the given item; false otherwise. |
T peekMin() | Returns the item with least-valued priority. |
T removeMin() | Removes and returns the item with least-valued priority. |
void changePriority(T item, double priority) | Changes the priority of the given item. |
int size() | Returns the number of items in the PQ. |
boolean isEmpty() | Returns true if the PQ is empty, false otherwise. |
One last note: this priority queue can only contain unique items. There cannot be two copies of the same item, though it’s possible for two different items to be assigned the same priority value. In this case, you may break ties arbitrarily.
Naive Implementation
- Reference
- For reference, we have provided
NaiveMinPQ
, a simple and slow implementation of theExtrinsicMinPQ
interface.
peekMin
, removeMin
, contains
, and changePriority
use Java’s Stream API to do brute force searches over the entire list of items. This takes time proportional to the length of the list, or .
- Note
- This implementation does not throw the correct exception for
add
when adding duplicate items, since that would make the class too slow to be usable for comparing to ourArrayHeapMinPQ
.
Heap Implementation
- Task
- Complete
ArrayHeapMinPQ
, implementingExtrinsicMinPQ
subject to the following constraints:
- Required fields:
- You must store your min-heap in the field named
items
. It should be anArrayList
ofPriorityNode
s, the class used to store items along with their priorities. - Your heap
ArrayList
must start at the index indicated bySTART_INDEX
. (You may change the value of this constant to match whatever you implement.) - (It’s okay to have additional fields.)
- You must store your min-heap in the field named
- Required runtimes:
peekMin
,contains
,size
andchangePriority
must run in time.add
andremoveMin
must run in time except for the rare resize operation.
- You may not import other priority queue implementations.
Tips
- Iterating over your entire heap takes linear time, so make sure not to do this in any method!
- Be careful when using
ArrayList
methods! Double-check the runtime of any methods you use.
- Be careful when using
- Sometimes, if some computation is slow and expensive, it’s useful to store some extra information to help speed up that computation at the cost of some extra memory usage. As a reminder, your class may use other existing ADTs and data structures, so consider what ADTs or data structures might be useful in achieving the required asymptotic runtimes.
- You might find the
MinPQ
class from the algs4 library to be a helpful reference. We don’t recommend copying and pasting code directly into your implementation, though: while it might seem like this will save you time and effort, it will end up being more work than just building everything yourself from scratch. - We’ve provided some tests in
ArrayHeapMinPQTests
, but there are essentially no tests forchangePriority
, nor are there any tests checking that your heap works at larger sizes.- It may be helpful to write some tests for
changePriority
before you start implementing it, since considering the possible test cases can help you plan out your approach.
- It may be helpful to write some tests for
- Note
- You may not share tests with any other students—all code in your possession, including tests, should be your own work. You may discuss different edge cases and share ideas about test cases, but test code falls under the same academic honesty policy as any other project code.
Tips for writing tests:
- We have also provided a
PrintHeapDemo
class that can print out an array in heap format. You’re welcome to adapt this code into your own. It might be useful for debugging. - Outside of the provided invariant checks, we don’t recommend running assertions directly on any of your data structure’s fields.
- You may be tempted to directly check that e.g.
changePriority
correctly set a particular node’s priority, but these checks tend to be either too brittle (How are we finding the node in the heap? Will we need to completely rewrite the test if we change some minor implementation detail, like how we’re storing our data?) or too complex (Will we have to write a bunch of code just to figure out where the node is? This could introduce bugs in our tests.). - Instead, focus on writing tests that evaluate correctness based on the outputs of methods that provide output (e.g.
peekMin
andremoveMin
).
- You may be tempted to directly check that e.g.
- Bonus: After you have confidence that your implementation passes basic tests, one possibility for comprehensive testing is to perform randomized testing by comparing the outputs of your implementation against the provided
NaiveMinPQ
. By randomly choosing to add, remove, and get items, we can simulate real-world workloads and check to see that our complex-but-fastArrayHeapMinPQ
matches the results of a simple-but-slowNaiveMinPQ
.- This kind of testing makes it easy to check whether your code is buggy, but very difficult to actually find the bug. If your randomized tests determine that your code is incorrect, try to first convert it into a small and simple test case that you can then more-easily debug.
- Make sure that your priority values are unique: duplicate priority values can be returned in any arbitrary order, and the order returned by the
NaiveMinPQ
might not match the order returned by yourArrayHeapMinPQ
.
Tips for checking runtimes:
- It may be useful to check your implementation’s runtime at some moderately large sizes. For reference, when removing all items from a priority queue of size 10,000, our solution is about 15–20 times faster than the
NaiveMinPQ
. TheTimingDemo
class shows how you can use theSystem.currentTimeMillis
method or theedu.princeton.cs.algs4.Stopwatch
class to time your code. - For a slightly-more-thorough estimate of your implementation’s asymptotic runtime, try writing your own experiments by copying and modifying one of the experiment files from previous projects. You can try timing your heap against the naive priority queue, or simply examine the shape of plot to sanity check that your methods have the correct asymptotic runtimes.
Grading Details
- The provided tests check the heap invariant in the same way as the grader-only tests:
- Every node must have a priority less than or equal to that of its children.
- The
items
field contains your heap, which starts at the index indicated by yourSTART_INDEX
field. You are not required to useSTART_INDEX
in your code; this field exists exclusively for the invariant checks, so just make sure it matches your code.
- Most of the points for this assignment will be allocated towards evaluating the correctness of
ArrayHeapMinPQ
, but a few points will be allocated towards efficiency. The stress tests on the grader have 3 sequential sections to testadd
,changePriority
, andremove
, respectively (all using the same heap instance). Because the sections run sequentially, there are a few cases that could occur during grading:- Your code throws an exception. In this case, all subsequent stress tests are skipped, and you get no points from any stress tests.
- Your code takes an excessively long time to run. In this case, the stress tests will throw an exception, and you get the same outcome as above.
- Your code runs, but the returns incorrect values in the
remove
section. In this case, no points will be awarded from the stress tests. - Your code runs and returns the correct values in the
remove
section. In this case, the runtimes for each section will be scored independently.
Submission
- Task
- Commit and push your changes to GitLab before submitting to Gradescope.
Feedback Survey
We’d appreciate if you take a couple minutes to complete the individual feedback survey on Canvas for extra credit. (Each partner needs to submit their own individual response.)
Josh Hug. 2019. Project 2AB: Extrinsic PQ and KDTree. In CS 61B: Data Structures, Spring 2019. https://sp19.datastructur.es/materials/proj/proj2ab/proj2ab ↩