Priority Queues and Heaps
Table of contents
Complete binary trees
In our initial attempt to design a balanced search tree structure, we introduced the following invariant: “each node must have either 0 or 2 children.” (A binary tree with this property is known as a full binary tree.) We were able to come up with an example binary search tree structure that violated this invariant, which inspired a different hypothesis that “unbalanced growth leads to worst-case height trees.” This hypothesis formed the basis of B-trees.
But what if we didn’t give up here? Consider the binary trees depicted below. Let’s develop an invariant that includes the balanced trees and excludes unbalanced trees.
One such invariant is the definition of complete binary trees.
- Complete
- Missing nodes only at the bottom level (if any) and all nodes are as left as possible.
Unfortunately, we don’t know how to efficiently maintain completeness while also maintaining the binary search tree invariant. Consider how we might maintain the completeness invariant while adding the item “A” to the following BST.
It’s hard to efficiently implement a set or map abstract data types without the binary search tree invariant because we no longer know where items will exist in the tree. Instead, we’ll investigate how we can use complete binary trees to implement the priority queue abstract data type.
Priority queues
The priority queue is an abstract data type that optimizes data for access according to their associated priority values. Each item inserted into a priority queue has some sense of priority relative to other items.
The max-oriented priority queue is characterized by three operations: accessing the maximum priority item, removing the maximum priority item, and adding new items. The min-oriented priority queue is similar but optimizes for the minimum priority item rather than the maximum priority item. Duplicate items and duplicate priorities are allowed. Ties can be broken arbitrarily in case two items share the same priority.
Consider the following interface for a max-oriented priority queue, MaxPQ
. For simplicity, we will assume that the priority queue stores numbers with the priority given by the value of the number.
add(int item)
- Add the given
item
to the priority queue. max()
- Returns the item with the highest priority.
removeMax()
- Removes and returns the item with the highest priority.
Describe how to implement MaxPQ with an unordered linked list.
In an unordered linked list, we can put our items anywhere in the list. This will make insertion fast because we can add the item to the front of the list in constant time, but finding the maximum slow since we need to scan across the entire linked list.
Describe how to implement MaxPQ with an ordered linked list.
In an ordered linked list, the opposite is true. We can find the maximum quickly since we can maintain a reference to it (either the front or back of the list, our choice), allowing us to remove and return the maximum priority item in constant time. However, this slows down the insertion process since we need to find the exact, ordered position for the item in the list.
Describe how to implement MaxPQ with a binary search tree.
A binary search tree maintains the order of items in the structure of the tree. The smallest key is stored in the leftmost node in the tree while the largest key is stored in the rightmost node in the tree. Therefore, we can keep a reference to the rightmost node in the tree and maintain it as we add items and remove the maximum priority item from the priority queue.
Compare the runtimes of each implementation in the table below, where N is the size of the priority queue.
Operation | Unordered Linked List | Ordered Linked List | BST | 2-3 Tree |
---|---|---|---|---|
add | O(1) | O(N) | O(N) | O(log N) |
max | O(N) | O(1) | O(1) | O(1) |
removeMax | O(N) | O(1) | O(N) | O(log N) |
While the 2-3 tree implementation is appealing, handling items with duplicate priority values requires additional complexity. Furthermore, it turns out there is another data structure with the same asymptotic performance but even better constant factors.
Binary heap data structure
A binary max-heap is a complete binary tree that maintains the max-heap invariant.
- Max-Heap Invariant
- The priority of each item in the heap is greater than or equal to both of its children.
By construction, the largest value in a max-heap is always the root. To implement the max
method, just return the root item.
Implementing removeMax
, however, is a bit trickier.
Removing the max item requires three steps.
- Swap the root with the last leaf.
- Remove the last leaf.
- Sink the new root to its proper place, promoting the larger child.
removeMax
relies on the sink operation to maintain heap invariants.
- Sink
- Swap a node down the tree until it is larger than both of its children. Promote the larger child. Break ties arbitrarily.
- Swim
- Swap a node up the tree until its parent is larger than itself.
We can add
an item to the heap using the corresponding swim operation.
- Add the item as a rightmost leaf.
- Swim to restore heap invariants.
Tree representations
One of the main advantages of using heaps to implement priority queues is in the tree representation. We can represent trees in different ways.
- Node representation
- Store items together with structure. Map parent to child relationships.
- Array representation
- Store items separate from structure. Map child to parent relationships.
Since heaps are complete binary trees, we typically represent them with arrays.