Composing and testing a complex data structure from scratch.1
Composing programs at multiple levels of abstraction.
An important programming skill when designing and developing novel data structures is to learn how to plan appropriately. One important subtlety about planning is that it occurs at multiple levels of abstraction. Planning happens at a conceptual level when we design a data structure. However, planning also occurs when we implement and debug each method.
Integrate multiple data structures and algorithms to solve a problem with complex requirements.
Due to the specific runtime requirements of this heap implementation, our solution will need to creatively combine different data structures and algorithms to design a better solution. Source code for heaps is widely available on the internet (which you’re welcome to reference so long as they’re cited as comments), but what’s tricky is learning how to identify, select, and integrate ideas from different sources.
- Getting Started
- ExtrinsicMinPQ API
- Optional Extension: Trending Topics
Pull the skeleton repository to get the
Your task for this homework assignment will be to develop an array-based implementation of the
ExtrinsicMinPQ interface. The
ExtrinsicMinPQ interface similar to the
MaxPQ interface that we discussed in lecture, except instead of optimizing for access to the maximum priority item, we optimize for access to the minimum priority item.
public void add(T item, double priority): Adds an item of type
Twith the given priority.
public T getSmallest(): Returns the item with smallest priority.
public T removeSmallest(): Removes and returns the item with smallest priority.
public int size(): Returns the number of items.
public boolean contains(T item): Returns true if the PQ contains the given item.
public void changePriority(T item, double priority): Sets the priority of the given item to the given value.
Furthermore, the priority is extrinsic to the object. That is, rather than relying on some sort of comparison function to decide which item is less than another, we assign a priority value using the
In addition, this priority queue can only contain unique items. While there cannot be two copies of the same item, it’s possible for two different items to be assigned the same priority value. In this case, you can break ties arbitrarily.
For reference, we have provided
NaiveMinPQ, a simple and slow implementation of the
removeSmallest use built-in methods from
Collections to do brute force searches over the entire list of items. This takes time proportional to the length of the list, or O(N).
- This implementation does not throw the correct exception for
addwhen adding duplicate items, since that would make the class too slow to be usable for comparing to our
Your task is to create a class,
ArrayHeapMinPQ, that implements the
ExtrinsicMinPQ interface subject to the following rules.
- One of your instance variables must be a min-heap represented as either an array or
ArrayList. It is OK to have additional private variables.
- You may only import from the
org.junitlibraries. You can also use the
StdRandomclasses from the Princeton libraries.
- You may not add additional public methods beyond what is expected by the
ExtrinsicMinPQinterface. Private and package-private helper methods are allowed.
In addition, there are several asymptotic runtime requirements.
changePrioritymust run in O(log N) time.
removeSmallestmust run in O(log N) time except for the rare resize operation.
For reference, when making 1000 queries on a heap of size 1,000,000, our solution is about 300 times faster than the
NaiveMinPQ. Iterating over your entire min-heap for any of these methods will take linear time!
If you intend to use an array representation for your min-heap, make sure to increase the array size by a multiplicative factor when resizing. Otherwise,
add will be linear time on average, not log time. To prevent the autograder from running out of memory, the array should never be more than approximately three-quarters empty.
We have also provided a
PrintHeapDemo class that can print out an array in heap format. You’re welcome to adapt this code into your own. It might be useful for debugging. The
TimingTestDemo class shows how you can use the
System.currentTimeMillis method or the
edu.princeton.cs.algs4.Stopwatch class to time your code.
You might find the
MinPQ.java class from the textbook to be a helpful reference. We don’t recommend copying and pasting code directly into your implementation. While it might seem like this will save you time and effort, it will end up being more work than just building everything yourself from scratch.
As a reminder, your class may use other data structures.
Add your tests to a new class,
We will not provide any skeleton tests nor autograder messages for this homework. You will be responsible for writing your own tests and ensuring the correctness of your code. While you need to submit your tests to the autograder, they won’t be graded.
Write tests as you implement each
ExtrinsicMinPQ method. It is often helpful to write the tests before writing the implementation because writing a test helps with planning for the task you’re trying to solve. This helps ensure that you are building a solid foundation since each method may call previously-implemented methods.
We encourage you to primarily write tests that evaluate correctness based on the outputs of methods that provide output (e.g.
removeSmallest). This is in contrast to tests that try to directly test the states of instance variables. For example, to test
changePriority, you might use a sequence of
add operations, a
changePriority call, and then finally check the output of a
removeSmallest call. This is usually a better idea than iterating over your private variables to see if
changePriority is setting some specific instance variable.
If you really want to write tests that require inspecting the internal behavior of your program, then you’ll need to add them to the
ArrayHeapMinPQ class itself. For example, suppose you want to verify that your array is [1, 2, 4, 5, 3] after adding the items 5, 4, 3, 2, 1. In this case, a hypothetical
testAdd54321 method would need to be part of the
ArrayHeapMinPQ class since the
ArrayHeapMinPQTest can’t access private variables in the
ArrayHeapMinPQ class. We don’t recommend doing this.
Don’t forget edge cases! Consider how the heap could look before inserting or removing items, and ensure that your implementation handles all possible cases. For example, consider what happens when sinking a node when its priority is greater than both of its children.
After you have confidence that your implementation passes basic tests, one possibility for comprehensive testing is to perform randomized testing and comparing the outputs of your implementation against the provided
NaiveMinPQ. By randomly choosing to add, remove, and get items, we can simulate real-world workloads and check to see that our complex-but-fast
ArrayHeapMinPQ matches the results of a simple-but-slow
NaiveMinPQ. Make sure that your priority values are unique: duplicate priority values can be returned in any arbitrary order, and the order returned by the
NaiveMinPQ might not match the order returned by your
ArrayHeapMinPQ to implement a stream processing algorithm for identifying trending Twitter topics from a 1% random sampling of the Twitter Stream on a given day.
A hashtag is considered trending if its frequency is much higher in the present than it was in the recent past. For example, if we set our time interval to 3 hours, the ratio between the frequency in the present interval compared to the previous interval determines the trend factor. The hashtags with the highest trend factors are the trending topics.
Implement an efficient algorithm for finding all of the trending topics in a given day. We will provide no guidance for this part.
Commit and push your changes to GitLab before submitting your homework to Gradescope.
Josh Hug. 2019. Project 2AB: Extrinsic PQ and KDTree. In CS 61B: Data Structures, Spring 2019. https://sp19.datastructur.es/materials/proj/proj2ab/proj2ab ↩