Link

Priority Queues and Heaps

Table of contents

  1. Complete binary trees
  2. Priority queues
  3. Binary heap data structure
  4. Tree representations

Complete binary trees

In our initial attempt to design a balanced search tree structure, we introduced the following invariant: “each node must have either 0 or 2 children.” (A binary tree with this property is known as a full binary tree.) We were able to come up with an example binary search tree structure that violated this invariant, which inspired a different hypothesis that “unbalanced growth leads to worst-case height trees.” This hypothesis formed the basis of B-trees.

Worst-Case Full Tree

But what if we didn’t give up here? Consider the binary trees depicted below. Let’s develop an invariant that includes the balanced trees and excludes unbalanced trees.

Identifying Complete Trees

One such invariant is the definition of complete binary trees.

Complete
Missing nodes only at the bottom level (if any) and all nodes are as left as possible.

Unfortunately, we don’t know how to efficiently maintain completeness while also maintaining the binary search tree invariant. Consider how we might maintain the completeness invariant while adding the item “A” to the following BST.

Impossible Complete BST Insertion

It’s hard to efficiently implement a set or map abstract data types without the binary search tree invariant because we no longer know where items will exist in the tree. Instead, we’ll investigate how we can use complete binary trees to implement the priority queue abstract data type.

Priority queues

The priority queue is an abstract data type that optimizes data for access according to their associated priority values. Each item inserted into a priority queue has some sense of priority relative to other items.

The max-oriented priority queue is characterized by three operations: accessing the maximum priority item, removing the maximum priority item, and adding new items. The min-oriented priority queue is similar but optimizes for the minimum priority item rather than the maximum priority item. Duplicate items and duplicate priorities are allowed. Ties can be broken arbitrarily in case two items share the same priority.

Consider the following interface for a max-oriented priority queue, MaxPQ. For simplicity, we will assume that the priority queue stores numbers with the priority given by the value of the number.

add(int item)
Add the given item to the priority queue.
max()
Returns the item with the highest priority.
removeMax()
Removes and returns the item with the highest priority.
Describe how to implement MaxPQ with an unordered linked list.

In an unordered linked list, we can put our items anywhere in the list. This will make insertion fast because we can add the item to the front of the list in constant time, but finding the maximum slow since we need to scan across the entire linked list.

Describe how to implement MaxPQ with an ordered linked list.

In an ordered linked list, the opposite is true. We can find the maximum quickly since we can maintain a reference to it (either the front or back of the list, our choice), allowing us to remove and return the maximum priority item in constant time. However, this slows down the insertion process since we need to find the exact, ordered position for the item in the list.

Describe how to implement MaxPQ with a binary search tree.

A binary search tree maintains the order of items in the structure of the tree. The smallest key is stored in the leftmost node in the tree while the largest key is stored in the rightmost node in the tree. Therefore, we can keep a reference to the rightmost node in the tree and maintain it as we add items and remove the maximum priority item from the priority queue.

Compare the runtimes of each implementation in the table below, where N is the size of the priority queue.

OperationUnordered Linked ListOrdered Linked ListBST2-3 Tree
addO(1)O(N)O(N)O(log N)
maxO(N)O(1)O(1)O(1)
removeMaxO(N)O(1)O(N)O(log N)

While the 2-3 tree implementation is appealing, handling items with duplicate priority values requires additional complexity. Furthermore, it turns out there is another data structure with the same asymptotic performance but even better constant factors.

Binary heap data structure

A binary max-heap is a complete binary tree that maintains the max-heap invariant.

Max-Heap Invariant
The priority of each item in the heap is greater than or equal to both of its children.

By construction, the largest value in a max-heap is always the root. To implement the max method, just return the root item.

Implementing removeMax, however, is a bit trickier.

Removing the max item requires three steps.

  1. Swap the root with the last leaf.
  2. Remove the last leaf.
  3. Sink the new root to its proper place, promoting the larger child.

removeMax relies on the sink operation to maintain heap invariants.

Sink
Swap a node down the tree until it is larger than both of its children. Promote the larger child. Break ties arbitrarily.
Swim
Swap a node up the tree until its parent is larger than itself.

We can add an item to the heap using the corresponding swim operation.

  1. Add the item as a rightmost leaf.
  2. Swim to restore heap invariants.

Tree representations

One of the main advantages of using heaps to implement priority queues is in the tree representation. We can represent trees in different ways.

Node representation
Store items together with structure. Map parent to child relationships.
Array representation
Store items separate from structure. Map child to parent relationships.

Since heaps are complete binary trees, we typically represent them with arrays.