Link

Heap

Composing and testing a complex data structure from scratch, again.1

Learning Goals

  • Integrate multiple ADTs and data structures to solve a problem with complex requirements.

    Due to the specific runtime requirements of this heap implementation, our solution will need to creatively combine different ADTs and data structures to design a better solution. Source code for heaps is widely available on the internet (which you’re welcome to reference so long as they’re cited in comments), but what’s tricky is learning how to identify, select, and integrate ideas from different sources.

  • Plan an implementation and its invariants at multiple levels of abstraction.

    When designing and developing novel data structures, it’s important to learn how to plan appropriately.

    At a high level, it’s necessary to determine the invariants that a data structure should maintain before and after every method call, and at a low level, it’s useful to determine how these invariants affect the shared functionality that can be extracted into helper methods. In the process, it may also be necessary to consider which of the invariants these helper methods themselves require in order to function correctly.

Table of contents

  1. Getting the Assignment
  2. ExtrinsicMinPQ API
    1. Naive Implementation
    2. Heap Implementation
  3. Submission
  4. Feedback Survey

Getting the Assignment

Task
Pull the skeleton repository in IntelliJ to get the heap assignment.

Like previous projects, if IntelliJ doesn’t react to the new folder for whatever reason, refresh Gradle manually through the Gradle tool window (on the right by default):

Gradle Refresh

ExtrinsicMinPQ API

Background
Read through the description of the interface you’ll be implementing.

The main task for this assignment will be to develop an array-based implementation of the ExtrinsicMinPQ interface. The priority for each item will be extrinsic to the object—that is, rather than relying on some sort of comparison function to decide which item has less priority than another, we manually assign priorities by passing in numbers along with the items.

SignatureDescription
void add(T item, double priority)Adds an item with the given priority value.
boolean contains(T item)Returns true if the PQ contains the given item; false otherwise.
T peekMin()Returns the item with least-valued priority.
T removeMin()Removes and returns the item with least-valued priority.
void changePriority(T item, double priority)Changes the priority of the given item.
int size()Returns the number of items in the PQ.
boolean isEmpty()Returns true if the PQ is empty, false otherwise.

One last note: this priority queue can only contain unique items. There cannot be two copies of the same item, though it’s possible for two different items to be assigned the same priority value. In this case, you may break ties arbitrarily.

Naive Implementation

Reference
For reference, we have provided NaiveMinPQ, a simple and slow implementation of the ExtrinsicMinPQ interface.

peekMin, removeMin, contains, and changePriority use Java’s Stream API to do brute force searches over the entire list of items. This takes time proportional to the length of the list, or Θ(N)\Theta(N).

Note
This implementation does not throw the correct exception for add when adding duplicate items, since that would make the class too slow to be usable for comparing to our ArrayHeapMinPQ.

Heap Implementation

Task
Complete ArrayHeapMinPQ, implementing ExtrinsicMinPQ subject to the following constraints:
  • Required fields:
    • You must store your min-heap in the field named items. It should be an ArrayList of PriorityNodes, the class used to store items along with their priorities.
    • Your heap ArrayList must start at the index indicated by START_INDEX. (You may change the value of this constant to match whatever you implement.)
    • (It’s okay to have additional fields.)
  • Required runtimes:
    • peekMin, contains, size and changePriority must run in O(logN)\mathcal{O}(\log N) time.
    • add and removeMin must run in O(logN)\mathcal{O}(\log N) time except for the rare resize operation.
  • You may not import other priority queue implementations.

Tips

  • Iterating over your entire heap takes linear time, so make sure not to do this in any method!
    • Be careful when using ArrayList methods! Double-check the runtime of any methods you use.
  • Sometimes, if some computation is slow and expensive, it’s useful to store some extra information to help speed up that computation at the cost of some extra memory usage. As a reminder, your class may use other existing ADTs and data structures, so consider what ADTs or data structures might be useful in achieving the required asymptotic runtimes.
  • You might find the MinPQ class from the algs4 library to be a helpful reference. We don’t recommend copying and pasting code directly into your implementation, though: while it might seem like this will save you time and effort, it will end up being more work than just building everything yourself from scratch.
  • We’ve provided some tests in ArrayHeapMinPQTests, but there are essentially no tests for changePriority, nor are there any tests checking that your heap works at larger sizes.
    • It may be helpful to write some tests for changePriority before you start implementing it, since considering the possible test cases can help you plan out your approach.
Note
You may not share tests with any other students—all code in your possession, including tests, should be your own work. You may discuss different edge cases and share ideas about test cases, but test code falls under the same academic honesty policy as any other project code.
Tips for writing tests:
  • We have also provided a PrintHeapDemo class that can print out an array in heap format. You’re welcome to adapt this code into your own. It might be useful for debugging.
  • Outside of the provided invariant checks, we don’t recommend running assertions directly on any of your data structure’s fields.
    • You may be tempted to directly check that e.g. changePriority correctly set a particular node’s priority, but these checks tend to be either too brittle (How are we finding the node in the heap? Will we need to completely rewrite the test if we change some minor implementation detail, like how we’re storing our data?) or too complex (Will we have to write a bunch of code just to figure out where the node is? This could introduce bugs in our tests.).
    • Instead, focus on writing tests that evaluate correctness based on the outputs of methods that provide output (e.g. peekMin and removeMin).
  • Bonus: After you have confidence that your implementation passes basic tests, one possibility for comprehensive testing is to perform randomized testing by comparing the outputs of your implementation against the provided NaiveMinPQ. By randomly choosing to add, remove, and get items, we can simulate real-world workloads and check to see that our complex-but-fast ArrayHeapMinPQ matches the results of a simple-but-slow NaiveMinPQ.
    • This kind of testing makes it easy to check whether your code is buggy, but very difficult to actually find the bug. If your randomized tests determine that your code is incorrect, try to first convert it into a small and simple test case that you can then more-easily debug.
    • Make sure that your priority values are unique: duplicate priority values can be returned in any arbitrary order, and the order returned by the NaiveMinPQ might not match the order returned by your ArrayHeapMinPQ.
Tips for checking runtimes:
  • It may be useful to check your implementation’s runtime at some moderately large sizes. For reference, when removing all items from a priority queue of size 10,000, our solution is about 15–20 times faster than the NaiveMinPQ. The TimingDemo class shows how you can use the System.currentTimeMillis method or the edu.princeton.cs.algs4.Stopwatch class to time your code.
  • For a slightly-more-thorough estimate of your implementation’s asymptotic runtime, try writing your own experiments by copying and modifying one of the experiment files from previous projects. You can try timing your heap against the naive priority queue, or simply examine the shape of plot to sanity check that your methods have the correct asymptotic runtimes.

Grading Details

  • The provided tests check the heap invariant in the same way as the grader-only tests:
    • Every node must have a priority less than or equal to that of its children.
    • The items field contains your heap, which starts at the index indicated by your START_INDEX field. You are not required to use START_INDEX in your code; this field exists exclusively for the invariant checks, so just make sure it matches your code.
  • Most of the points for this assignment will be allocated towards evaluating the correctness of ArrayHeapMinPQ, but a few points will be allocated towards efficiency. The stress tests on the grader have 3 sequential sections to test add, changePriority, and remove, respectively (all using the same heap instance). Because the sections run sequentially, there are a few cases that could occur during grading:
    • Your code throws an exception. In this case, all subsequent stress tests are skipped, and you get no points from any stress tests.
    • Your code takes an excessively long time to run. In this case, the stress tests will throw an exception, and you get the same outcome as above.
    • Your code runs, but the returns incorrect values in the remove section. In this case, no points will be awarded from the stress tests.
    • Your code runs and returns the correct values in the remove section. In this case, the runtimes for each section will be scored independently.

Submission

Task
Commit and push your changes to GitLab before submitting to Gradescope.

Feedback Survey

We’d appreciate if you take a couple minutes to complete the individual feedback survey on Canvas for extra credit. (Each partner needs to submit their own individual response.)


  1. Josh Hug. 2019. Project 2AB: Extrinsic PQ and KDTree. In CS 61B: Data Structures, Spring 2019. https://sp19.datastructur.es/materials/proj/proj2ab/proj2ab