Link

Priority Queues

Composing and testing a complex data structure from scratch.1

Learning Goals

  • Composing programs at multiple levels of abstraction.

    An important programming skill when designing and developing novel data structures is to learn how to plan appropriately. One important subtlety about planning is that it occurs at multiple levels of abstraction. Planning happens at a conceptual level when we design a data structure. However, planning also occurs when we implement and debug each method.

  • Integrate multiple data structures and algorithms to solve a problem with complex requirements.

    Due to the specific runtime requirements of this heap implementation, our solution will need to creatively combine different data structures and algorithms to design a better solution. Source code for heaps is widely available on the internet (which you’re welcome to reference so long as they’re cited as comments), but what’s tricky is learning how to identify, select, and integrate ideas from different sources.

Table of contents

  1. Getting Started
  2. ExtrinsicMinPQ API
  3. NaiveMinPQ
  4. ArrayHeapMinPQ
  5. Testing
  6. Optional Extension: Trending Topics
  7. Submission

Getting Started

Pull the skeleton repository to get the pq assignment.

ExtrinsicMinPQ API

Your task for this homework assignment will be to develop an array-based implementation of the ExtrinsicMinPQ interface. The ExtrinsicMinPQ interface similar to the MaxPQ interface that we discussed in lecture, except instead of optimizing for access to the maximum priority item, we optimize for access to the minimum priority item.

  • public void add(T item, double priority): Adds an item of type T with the given priority.
  • public T getSmallest(): Returns the item with smallest priority.
  • public T removeSmallest(): Removes and returns the item with smallest priority.
  • public int size(): Returns the number of items.
  • public boolean contains(T item): Returns true if the PQ contains the given item.
  • public void changePriority(T item, double priority): Sets the priority of the given item to the given value.

Furthermore, the priority is extrinsic to the object. That is, rather than relying on some sort of comparison function to decide which item is less than another, we assign a priority value using the add or changePriority functions.

In addition, this priority queue can only contain unique items. While there cannot be two copies of the same item, it’s possible for two different items to be assigned the same priority value. In this case, you can break ties arbitrarily.

NaiveMinPQ

For reference, we have provided NaiveMinPQ, a simple and slow implementation of the ExtrinsicMinPQ interface.

contains, getSmallest, and removeSmallest use built-in methods from ArrayList and Collections to do brute force searches over the entire list of items. This takes time proportional to the length of the list, or O(N).

Note
This implementation does not throw the correct exception for add when adding duplicate items, since that would make the class too slow to be usable for comparing to our ArrayHeapMinPQ.

ArrayHeapMinPQ

Your task is to create a class, ArrayHeapMinPQ, that implements the ExtrinsicMinPQ interface subject to the following rules.

  • One of your instance variables must be a min-heap represented as either an array or ArrayList. It is OK to have additional private variables.
  • You may only import from the java.util, org.junit libraries. You can also use the Stopwatch and StdRandom classes from the Princeton libraries.
  • You may not add additional public methods beyond what is expected by the ExtrinsicMinPQ interface. Private and package-private helper methods are allowed.

In addition, there are several asymptotic runtime requirements.

  • getSmallest, contains, size and changePriority must run in O(log N) time.
  • add and removeSmallest must run in O(log N) time except for the rare resize operation.

For reference, when making 1000 queries on a heap of size 1,000,000, our solution is about 300 times faster than the NaiveMinPQ. Iterating over your entire min-heap for any of these methods will take linear time!

Tips

If you intend to use an array representation for your min-heap, make sure to increase the array size by a multiplicative factor when resizing. Otherwise, add will be linear time on average, not log time. To prevent the autograder from running out of memory, the array should never be more than approximately three-quarters empty.

We have also provided a PrintHeapDemo class that can print out an array in heap format. You’re welcome to adapt this code into your own. It might be useful for debugging. The TimingTestDemo class shows how you can use the System.currentTimeMillis method or the edu.princeton.cs.algs4.Stopwatch class to time your code.

You might find the MinPQ.java class from the textbook to be a helpful reference. We don’t recommend copying and pasting code directly into your implementation. While it might seem like this will save you time and effort, it will end up being more work than just building everything yourself from scratch.

As a reminder, your class may use other data structures.

Testing

Add your tests to a new class, ArrayHeapMinPQTest.

We will not provide any skeleton tests nor autograder messages for this homework. You will be responsible for writing your own tests and ensuring the correctness of your code. While you need to submit your tests to the autograder, they won’t be graded.

Tips

Write tests as you implement each ExtrinsicMinPQ method. It is often helpful to write the tests before writing the implementation because writing a test helps with planning for the task you’re trying to solve. This helps ensure that you are building a solid foundation since each method may call previously-implemented methods.

We encourage you to primarily write tests that evaluate correctness based on the outputs of methods that provide output (e.g. getSmallest and removeSmallest). This is in contrast to tests that try to directly test the states of instance variables. For example, to test changePriority, you might use a sequence of add operations, a changePriority call, and then finally check the output of a removeSmallest call. This is usually a better idea than iterating over your private variables to see if changePriority is setting some specific instance variable.

If you really want to write tests that require inspecting the internal behavior of your program, then you’ll need to add them to the ArrayHeapMinPQ class itself. For example, suppose you want to verify that your array is [1, 2, 4, 5, 3] after adding the items 5, 4, 3, 2, 1. In this case, a hypothetical testAdd54321 method would need to be part of the ArrayHeapMinPQ class since the ArrayHeapMinPQTest can’t access private variables in the ArrayHeapMinPQ class. We don’t recommend doing this.

Don’t forget edge cases! Consider how the heap could look before inserting or removing items, and ensure that your implementation handles all possible cases. For example, consider what happens when sinking a node when its priority is greater than both of its children.

After you have confidence that your implementation passes basic tests, one possibility for comprehensive testing is to perform randomized testing and comparing the outputs of your implementation against the provided NaiveMinPQ. By randomly choosing to add, remove, and get items, we can simulate real-world workloads and check to see that our complex-but-fast ArrayHeapMinPQ matches the results of a simple-but-slow NaiveMinPQ. Make sure that your priority values are unique: duplicate priority values can be returned in any arbitrary order, and the order returned by the NaiveMinPQ might not match the order returned by your ArrayHeapMinPQ.

Use your ArrayHeapMinPQ to implement a stream processing algorithm for identifying trending Twitter topics from a 1% random sampling of the Twitter Stream on a given day.

A hashtag is considered trending if its frequency is much higher in the present than it was in the recent past. For example, if we set our time interval to 3 hours, the ratio between the frequency in the present interval compared to the previous interval determines the trend factor. The hashtags with the highest trend factors are the trending topics.

Implement an efficient algorithm for finding all of the trending topics in a given day. We will provide no guidance for this part.

Submission

Commit and push your changes to GitLab before submitting your homework to Gradescope.

  1. Josh Hug. 2019. Project 2AB: Extrinsic PQ and KDTree. In CS 61B: Data Structures, Spring 2019. https://sp19.datastructur.es/materials/proj/proj2ab/proj2ab