Link
Priority Queues and Heaps
Efficiently implementing the Priority Queue ADT by turning back time and rewriting binary search tree invariants.
Kevin Lin, with thanks to many others.
1
Ask questions anonymously on Piazza. Look for the pinned Lecture Questions thread.

Feedback from the Reading Quiz
2

Rewriting Invariants
Hypothesis. Worst-case height trees are spindly trees.
Identify.
Spindly tree: all nodes have either 0 children (leaf) or 1 child.
Bushy tree: all nodes have either 0 children (leaf) or 2 children.
Plan. Say we have a BST in which every node has either 0 or 2 children.
Analyze.
What is the worst case search time in this case?
What do worst case trees look like?
3
Q
4
2
6
1
3
5
7
4
2
6
1
3
5
7
Say we have a BST in which every node has either 0 or 2 children.

Q1: What is the worst case search time in this case?




Q2: What do worst case trees look like?

Rewriting Invariants
4
A
H(N) ∈ Θ(N)
Examples are key to helping us learn and iterate from our initial attempts. Unfortunately, this new invariant doesn’t capture the complexity of the problem.

?: What is the tilde notation (like Big-Theta but keeping multiplicative constants) for the height of this tree?

Balanced Binary Search Trees
Full. Every node has either 0 or 2 children.
Describe an invariant that includes the balanced trees below and excludes unbalanced trees.
5
Q
Let’s go back in time. What if we don’t accept this resolution? Sure, a full binary search tree does not guarantee balance. However, we can come up with another invariant that does guarantee balance!

Q1: Describe an invariant that includes the balanced trees below, and excludes unbalanced trees.

Describe an invariant that includes the balanced trees below and excludes unbalanced trees.
6

Balanced Binary Search Trees
Full. Every node has either 0 or 2 children.
Complete. Missing nodes only at the bottom level (if any), all nodes are as left as possible.
7
A

Pivoting to Priority Queues
Unfortunately, we don’t know how to efficiently maintain BST completeness.

Hypothesis. Too slow to maintain both the Binary Search Tree Invariant and the Completeness Invariant. Drop the BST Invariant and choose a faster invariant.

Let’s implement a priority queue instead.
8
rotateRight(Z)
E
C
Z
B
D
Y
X
E
C
Z
B
D
Y
A
D
B
Y
A
C
E
Z
?: For the bottom example, give an asymptotic lower bound (i.e. Big-Omega) for the runtime to fix the tree where N is the number of items.

Binary Max-Heap
Plan. Optimize for MaxPQ: put the max-priority item at the root of the tree.
A Binary Max-Heap has two invariants.
Max-Heap Invariant. Every node greater than or equal to both its children.
Completeness Invariant. Missing nodes only at the bottom level (if any), all nodes are as left as possible
9
6
5
8
4
7
1
5
4
8
6
7
1
Note that, in this visualization, the priority is the value shown in the node. Hereafter, we’ll refer to the max-priority item as just “max item” for brevity.

Which of these are valid max-heaps?
10
8
8
8
8
8
8
7
5
8
6
2
3
9
0
7
6
5
0
1

Which of these are valid max-heaps?
11
A
8
8
8
8
8
8
7
5
8
6
2
3
9
0
7
6
5
0
1
The second tree is not complete.

The third tree does not satisfy the max-heap invariant.

Returning the Max
By construction, the largest value in a max-heap is always the root of the tree.

Max-Heap Invariant is recursive. Subtree rooted at node 6 is itself a max-heap!
12
6
5
8
4
7
1
?: What does the fact that the invariant is recursive guarantee about the relationship between the root 8 and its grandchildren, 4, 5, and 1? What about potential great-grandchildren?

Removing: First Algorithm
Goal. Remove and return the max item.


Remove the root.
Promote the larger child recursively.
13
6
5
8
4
7
1
6
5
7
4
1
?: Give an example where this algorithm goes wrong.




?: What’s the running time of this algorithm with respect to H, the height of the heap?

Removing: First Algorithm
Goal. Remove and return the max item.


Remove the root.
Promote the larger child recursively.

This algorithm is broken. Fill in the blanks with valid heap values such that the heap is no longer valid after removing the max.
14
8
Q
Q1: This algorithm is broken. Fill in the blanks with valid heap values such that the heap is no longer valid after removing the max.

4
5
3
2
1
4
5
3
2
1
5
4
3
2
1
5
4
8
3
2
1
Removing: First Algorithm
Goal. Remove and return the max item.


Remove the root.
Promote the larger child recursively.

This algorithm is broken. Fill in the blanks with valid heap values such that the heap is no longer valid after removing the max.
15
A
Not complete!
There are infinitely many possible answers.

Removing: Safe Removal
Problem. Removing the root node leaves a hole in the heap that isn’t easily fixed.

Are there any nodes in the heap that are safe to remove, i.e. removed without affecting any other nodes in the heap?
16
Q
5
4
8
3
2
1
?: What invariants do we need to keep in mind when implementing remove?




Q1: Are there any nodes in the heap that are safe to remove, i.e. removed without affecting any other nodes in the heap?

Removing: Safe Removal
Problem. Removing the root node leaves a hole in the heap that isn’t easily fixed.

Are there any nodes in the heap that are safe to remove, i.e. removed without affecting any other nodes in the heap?

Rightmost leaf node can be removed without violating max-heap invariants.
17
A
5
4
8
3
2
1
?: What about the two other leaf nodes on the bottom level? Why can’t they be safely removed?

4
1
5
3
2
1
4
5
3
2
5
4
1
3
2
5
4
1
3
2
8
Removing the Max
Problem. Removing the root node leaves a hole in the heap that isn’t easily fixed.

Swap root with rightmost leaf.
Remove rightmost leaf.
Sink new root to its proper place, promoting the larger child.
18
5
4
8
3
2
1
?: What about the two other leaf nodes on the bottom level? Why can’t they be safely removed?

Maintaining Heap Invariants
Sink.
Swap a node down the tree until it is larger than both of its children. Promote the larger child. Can break ties arbitrarily.
Swim.
Swap a node up the tree until its parent is larger than itself.
19
6
2
3
1
4
3
2
6
1
2
5
?: How can we use these operations to insert an item?

Inserting an Item
Give an algorithm for inserting an item.
For example, add the item 8 to this heap.
20
Q
3
2
6
1
2
5
Q1: Give an algorithm for inserting an item. For example, add the item 8 to this heap.

Inserting an Item
Give an algorithm for inserting an item.
For example, add the item 8 to this heap.

Add the item as a new rightmost leaf node.
Swim to restore heap invariants.
21
A
3
2
6
1
2
5
Q1: Give an algorithm for inserting an item. For example, add the item 8 to this heap.

Tree Representations
22

Node Representation
public class TreeNode<Item> {    Item item;    TreeNode<Item> left;    TreeNode<Item> right;}
A
B
A
C
B
C
Store items together with structure.
Map parent to child relationships.

2
2
Array Representation
public class ArrayTree<Item> {    Item[] items;    int[] parents;    ...}
B
C
A
items
0
0
0
parents
Store items separate from structure.
Map child to parent relationships.
No explicit links needed!
B
A
C
0
1
2
1
1
2
0
0

25
items and parents
k
e
v
b
g
p
y
a
d
f
j
m
r
x
0
0
0
1
1
2
2
3
3
4
4
5
5
6
0
1
2
3
4
5
6
7
8
9
10
11
12
13
0
1
2
3
4
5
6
7
8
9
10
11
12
13
e
b
g
a
d
f
j
v
p
y
m
r
x
k
0
1
2
3
4
5
6
7
8
9
10
11
12
13
2
3
items
parents
Note that the value of each letter (k, e, v, …) doesn't mean anything.

26
items without parents
k
e
v
b
g
p
y
a
d
f
j
m
r
x
0
1
2
3
4
5
6
7
8
9
10
11
12
13
e
b
g
a
d
f
j
v
p
y
m
r
x
k
0
1
2
3
4
5
6
7
8
9
10
11
12
13
2
3
items
private int parent(int i) { return /* ????? */ ; }
Q
Assumption: complete tree
Q1: Complete the return statement in the parent method.

27
items without parents
k
e
v
b
g
p
y
a
d
f
j
m
r
x
0
1
2
3
4
5
6
7
8
9
10
11
12
13
e
b
g
a
d
f
j
v
p
y
m
r
x
k
0
1
2
3
4
5
6
7
8
9
10
11
12
13
2
3
items
private int parent(int i) { return (i - 1) / 2; }
A
Assumption: complete tree
Off-by-one arithmetic is somewhat annoying from the implementer’s perspective, though it doesn’t affect the ADT client.

Simplification: Empty Spot
Simplify arithmetic by leaving one empty spot at the front of the items array.

leftChild(k) = k * 2
rightChild(k) = k * 2 + 1
parent(k) = k / 2
28
e
b
g
a
d
f
j
v
p
y
m
r
x
k
0
1
2
3
4
5
6
7
8
9
10
11
12
13
k
e
v
b
g
p
y
a
d
f
j
m
r
x
-
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14

Heap Implementation of the Priority Queue
*Requires maintaining a pointer to the rightmost (max) key.
Heaps are asymptotically comparable to balanced search trees.
Heaps are faster by a significant constant factor. More on this later.
Heaps handle duplicate priorities much more naturally than BSTs.
29
Unordered Linked List
Ordered Linked List
Balanced BST
Binary Heap
add
O(1)
O(N)
O(log N)
O(log N)
max
O(N)
O(1)
O(1)*
O(1)
removeMax
O(N)
O(1)
O(log N)
O(log N)
There are still a few other questions of practical interest. We’ll get experience answering these questions in the upcoming homework.
How does a PQ know how to determine which item in a PQ is larger?
What could we change so that there is a default comparison?
What constructors are needed to allow for different orderings?
How can we optimize for the scenario where an item needs to change its priority after it’s been added to the PQ?

30
Implementer
Client
ADT
Writing codethat runs efficiently
Writing codeefficiently
Seems like a small improvement to learn about heaps. Why bother?

For the last few lectures, we’ve been exploring three levels of analysis. Let’s review it from the widest lens zooming in.

Implementer’s Design Decision Hierarchy
31
Priority Queue
Abstract Data Type
Which ADT is the best fit?

Data Structure
Which data structure offers the best performance for our input/workload?

Implementation Details
How do we maintain invariants?
Binary Heap
Linked Nodes
Heap-Order Invariant.
Completeness Invariant.
BST
As the ADT implementer, we always had to keep in mind our invariants when thinking through the problem.

Algorithm Design Process
32
Hypothesize. How does an invariant affect the behavior for each operation?
Identify. What strategies have we used before? What examples can we apply?
Plan. Propose a new way from findings.
Analyze. Does the plan do the job? What are potential problems with the plan?
Create. Implement the plan.
Evaluate. Check implemented plan.
Binary Search Tree Invariant.
For every node X in the tree:All keys in the left subtree ≺ X’s key.All keys in the right subtree ≻ X’s key.


B-Tree Invariants.
All leaves must be the same depth from the root. A non-leaf node with k items must have exactly k + 1 non-null children
Programming, Problem Solving, and Self-Awareness: Effects of Explicit Guidance (Loksa et al./CHI ‘16)
Iterative Refinement
In order to determine how to build the data structure and its implementation details, we employed the algorithm design process.

The Role of Information
33
What does debugging a program look like? (Julia Evans); The Debugging Mindset (Devon H. O’Dell/ACM Queue)
How are bugs fixed? Here’s one proposal.
Productive changes fix bugs.
Information gathered about the system informs productive changes.
A hypothesis guides information gathering and testing.
Things we know about the problem inform how we choose hypotheses.
ArrayQueue maintains certain invariants.
Unexpected result after add and remove.
The remove method decrements the size variable even when the queue is empty.
Modify the remove method to handle the special case of removing if empty.
The point here is that information is the most important thing and you need to do whatever’s necessary to get information.
1
2
3
Testing and debugging provides a means of evaluating program correctness.

Comprehending. Understanding the implementation details of a program.
Modeling. Counting the number of steps in terms of N, the size of the input.
Case Analysis. How certain conditions affect the program execution.
Asymptotic Analysis. Describing what happens for very large N, as N→∞.
Formalizing. Summarizing the final result in precise English or math notation.
Runtime Analysis Process
34
boolean dup1(int[] A)
Consider every pair
Array contains a duplicate at front
Array contains no duplicate items
Constant time
Quadratic time
Best: Θ(1)
Worst: Θ(N2)
Overall: Ω(1) and O(N2)
Worst case
Best case
Whereas the runtime analysis process provided a means to compare the running times of programs without writing code.