Link

Heap

Composing and testing a complex data structure from scratch.1

Learning Goals

  • Integrate multiple data structures and algorithms to solve a problem with complex requirements.

    Due to the specific runtime requirements of this heap implementation, our solution will need to creatively combine different data structures and algorithms to design a better solution. Source code for heaps is widely available on the internet (which you’re welcome to reference so long as they’re cited in comments), but what’s tricky is learning how to identify, select, and integrate ideas from different sources.

  • Planning invariants at multiple levels of abstraction.

    When designing and developing novel data structures, it’s important to learn how to plan appropriately. At a high level, it’s necessary to determine the invariants that a data structure should maintain, but it’s also necessary to plan at a lower level how and when to uphold these invariants. Some invariants may only be relevant between operations and can be broken during operations, whereas some may be internally important even during execution.

Table of contents

  1. Getting Started
  2. ExtrinsicMinPQ API
    1. NaiveMinPQ
    2. ArrayHeapMinPQ
    3. Testing
  3. Submission

Getting Started

Task
Pull the skeleton repository in IntelliJ to get the heap assignment.

Like previous homeworks, if IntelliJ doesn’t react to the new folder for whatever reason, refresh Gradle manually through the Gradle tool window (on the right by default):

Gradle Refresh

ExtrinsicMinPQ API

Background
Read through the description of the interface you’ll be implementing.

The main task for this homework assignment is be to develop an array-based implementation of the ExtrinsicMinPQ interface. The ExtrinsicMinPQ interface similar to the MaxPQ interface that we discussed in lecture, except we optimize for access to the minimum priority item instead of the maximum.

Furthermore, the priority is extrinsic to the object. That is, rather than relying on some sort of comparison function to decide which item is less than another, we manually assign a priorities by passing in numbers along with the items.

SignatureDescription
void add(T item, double priority)Adds an item with the given priority value.
boolean contains(T item)Returns true if the PQ contains the given item; false otherwise.
T peekMin()Returns the item with least-valued priority.
T removeMin()Removes and returns the item with least-valued priority.
void changePriority(T item, double priority)Changes the priority of the given item.
int size()Returns the number of items in the PQ.
boolean isEmpty()Returns true if the PQ is empty, false otherwise.

One last note: this priority queue can only contain unique items. There cannot be two copies of the same item, though it’s possible for two different items to be assigned the same priority value. In this case, you may break ties arbitrarily.

NaiveMinPQ

Reference
For reference, we have provided NaiveMinPQ, a simple and slow implementation of the ExtrinsicMinPQ interface.

peekMin, removeMin, contains, and changePriority use Java’s Stream API to do brute force searches over the entire list of items. This takes time proportional to the length of the list, or Θ(N)\Theta(N).

Note
This implementation does not throw the correct exception for add when adding duplicate items, since that would make the class too slow to be usable for comparing to our ArrayHeapMinPQ.

ArrayHeapMinPQ

Task
Implement ExtrinsicMinPQ in ArrayHeapMinPQ, subject to the following rules:
  • Required fields:
    • You must store your min-heap in the field named items. It should be an ArrayList of PriorityNodes, the class used to store items along with their priorities.
    • Your heap ArrayList must start at the index indicated by START_INDEX. (You may change the value of this constant to match whatever you implement.)
    • (It’s okay to have additional fields.)
  • Required runtimes:
    • peekMin, contains, size and changePriority must run in O(logN)\mathcal{O}(\log N) (or better) time.
    • add and removeMin must run in O(logN)\mathcal{O}(\log N) (or better) time except for the rare resize operation.
  • You may not import other priority queue implementations.

Most of the points for this homework will be allocated towards evaluating the correctness of ArrayHeapMinPQ. A few points will be allocated towards evaluating the efficiency of its efficiency, but these tests will only provide credit if the implementation is fully correct.

Tips

  • Iterating over your entire heap takes linear time, so make sure not to do this in any method!
  • Sometimes, if some computation is slow and expensive, it’s useful to store some extra information to help speed up that computation at the cost of some extra memory usage. As a reminder, your class may use other existing ADTs and data structures, so consider what ADTs or data structures might be useful in achieving the required asymptotic runtimes.
  • You might find the MinPQ class from the algs4 library to be a helpful reference. We don’t recommend copying and pasting code directly into your implementation, though: while it might seem like this will save you time and effort, it will end up being more work than just building everything yourself from scratch.

Testing

Recommended
Write more tests, and check the runtime of your implementation.

We’ve provided some tests in ArrayHeapMinPQTests, but you’ll likely want to write additional tests to ensure your implementation works even at larger sizes.

It may also be beneficial check your implementation’s runtime at some moderately large sizes. For reference, when removing all items from a priority queue of size 10,000, our solution is about 15–20 times faster than the NaiveMinPQ. The TimingDemo class shows how you can use the System.currentTimeMillis method or the edu.princeton.cs.algs4.Stopwatch class to time your code.

For a slightly-more-thorough estimate of your implementation’s asymptotic runtime, try comparing your runtimes at different sizes. When removing the same number of items from a heap with twice as many items, how much slower is your implementation? If your runtime is in O(logN)\mathcal{O}(\log N), doubling your size should not double your runtime—doubled runtime would be indicative of Ω(N)\Omega(N) growth. In fact, as size increases to infinity, you should expect to see runtime double only when the size gets squared. By checking your runtimes while repeatedly doubling your heap size, you should quickly be able to distinguish whether your runtime looks more like O(logN)\mathcal{O}(\log N) or Ω(N)\Omega(N).

You may not share tests with any other students—all code in your possession, including tests, should be your own work. You can discuss different edge cases and share ideas about test cases, but test code falls under the same academic honesty policy as any other homework code.

Tips

  • We have also provided a PrintHeapDemo class that can print out an array in heap format. You’re welcome to adapt this code into your own. It might be useful for debugging.
  • It is often helpful to write the tests for some code before implementing it, because writing a test helps with planning for the task you’re trying to solve. This helps ensure that you are building a solid foundation since each method may call previously-implemented methods.
    • For this assignment in particular, you can employ this tactic for changePriority, which is not tested by any of the provided tests.
  • Don’t forget edge cases! Consider how the heap could look before inserting or removing items, and ensure that your implementation handles all possible cases.
    • For example, consider what happens when sinking a node when its priority is greater than both of its children.
  • Our grader will be running invariant checks on your code, so we’ve again provided some skeleton code for implementing your own invariant checks in AbstractHeapMinPQAssert.
  • Beyond any invariant checking, we wouldn’t recommend testing any of your data structure’s fields directly.
    • You may be tempted to directly check that e.g. changePriority correctly set a particular node’s priority, but these checks tend to be either too brittle (How are we finding the node in the heap? Will we need to completely rewrite the test if we change some minor implementation detail, like how we’re storing our data?) or too complex (Will we have to write a bunch of code just to figure out where the node is? This could introduce bugs in our tests.).
    • Instead, focus on writing tests that evaluate correctness based on the outputs of methods that provide output (e.g. peekMin and removeMin).
  • Bonus: After you have confidence that your implementation passes basic tests, one possibility for comprehensive testing is to perform randomized testing by comparing the outputs of your implementation against the provided NaiveMinPQ. By randomly choosing to add, remove, and get items, we can simulate real-world workloads and check to see that our complex-but-fast ArrayHeapMinPQ matches the results of a simple-but-slow NaiveMinPQ.
    • This kind of testing makes it easy to check whether your code is buggy, but very difficult to actually find the bug. If your randomized tests determine that your code is incorrect, try to first convert it into a small and simple test case that you can then more-easily debug.
    • Make sure that your priority values are unique: duplicate priority values can be returned in any arbitrary order, and the order returned by the NaiveMinPQ might not match the order returned by your ArrayHeapMinPQ.

Submission

Task
Commit and push your changes to GitLab before submitting your homework to Gradescope.
  1. Josh Hug. 2019. Project 2AB: Extrinsic PQ and KDTree. In CS 61B: Data Structures, Spring 2019. https://sp19.datastructur.es/materials/proj/proj2ab/proj2ab