Link
Red-Black Trees
The problem with 2-3 Trees, and how the Algorithm Design Process can inform a solution.
Kevin Lin, with thanks to many others.
1
Ask questions anonymously on Piazza. Look for the pinned Lecture Questions thread.

Feedback from the Reading Quiz
2

Algorithm Design Process
3
Hypothesize. How does an invariant affect the behavior for each operation?
Identify. What strategies have we used before? What examples can we apply?
Plan. Propose a new way from findings.
Analyze. Does the plan do the job? What are potential problems with the plan?
Create. Implement the plan.
Evaluate. Check implemented plan.
Binary Search Tree Invariant.
For every node X in the tree:All keys in the left subtree ≺ X’s key.All keys in the right subtree ≻ X’s key.


B-Tree Invariants.
All leaves must be the same depth from the root. A non-leaf node with k items must have exactly k + 1 non-null children
Programming, Problem Solving, and Self-Awareness: Effects of Explicit Guidance (Loksa et al./CHI ‘16)
Iterative Refinement
In the previous lesson…

Improving Search Trees
Binary Search Trees (BST). Can balance a BST with rotation, but we have no fast algorithm. Rotating the entire tree is slow.

2-3 Trees. Balanced by construction: no rotations required. Tree splits nodes as needed, but the algorithm is complicated.

Hypothesis. Get the best of both worlds: a BST with the functionality of a 2-3 tree.
4
e
b
g
o
n
p
m
d f
b
g
o
n
p
m
e

Converting 2-3 Tree to BST
2-3 trees with only 2-nodes (2 children) are already regular binary search trees.
How can we represent 3-nodes in a BST?
5
e
b
g
o
n
p
m
?
d f
b
g
o
n
p
m
e
?: How can we represent 3-nodes in a BST?

Apply the Rotation Insight
6
f
d
e
b
g
> f
> d and < f
< d
f
d
e
b
g
> f
> d and < f
< d
Left-leaning
Right-leaning
d f
b
g
e
We learned in the reading that rotations, in effect, combine two nodes together and then split them up again into one of these two configurations. However, writing code to handle both cases is unnecessary since we only need one representation. Let’s (arbitrarily) choose to use only the left-leaning representation.

?: Why does handling both representations take more work?

Convert the 2-3 Tree to a Left-Leaning BST
7
Q
x y
a s
v
u w
Q1: Convert the 2-3 Tree to a Left-Leaning BST.

Convert the 2-3 Tree to a Left-Leaning BST
8
A
x y
a s
v
u w
s
v
u
w
x
y
a

Convert the Left-Leaning BST to a 2-3 Tree
9
Q
f
o
n
p
m
d
g
b
e
Q1: Convert the Left-Leaning BST to a 2-3 Tree.




?: How did you determine which nodes were 2-nodes? 3-nodes?

Convert the Left-Leaning BST to a 2-3 Tree
10
A
d f
b
g
o
n
p
m
e
f
o
n
p
m
d
g
b
e
Going from a 2-3 Tree to a LLBST is easier than going back from an LLBST back to a 2-3 Tree. It’s harder to see the latent structure in the unlabeled binary search tree: there’s some amount of guess-and-check or logical deduction needed to figure out which nodes are 2-nodes and 3-nodes.

For the same reason, it’s tricky to write this as an algorithm.

Left-Leaning Red-Black Tree
Left-Leaning Red-Black (LLRB) Tree. Take a left-leaning BST and color the link connecting two items in a 3-node red.

There is a 1-1 correspondence (bijection) between 2-3 trees and LLRB trees.
2-nodes are the same in both trees.
3-nodes are connected by a red link.
11
f
o
n
p
m
d
g
b
e
d f
b
g
o
n
p
m
e
Note that “left-leaning binary search trees” don’t actually exist, but it’s useful to know that the red links role is to make it easier to figure out which nodes are 3-nodes.

?: Why do red links lean left? Can a red link connect to a right child in an LLRB?




?: What do the black links in an LLRB connect in the analogous 2-3 tree?




?: What does it mean to have 1-1 correspondence?

B
A
C
X
G
B
A
C
X
G
B
A
C
X
G
C
B
X
G
A
Which of these are valid LLRB trees?
12
Use the fact that LLRB trees have a 1-1 correspondence with 2-3 trees.

Q1: Which of these are valid LLRB trees?

Which of these are valid LLRB trees?
13
B
A
C
X
G
B
A
C
X
G
B
A
C
X
G
C
B
X
G
A
A
A B C
X
G
A B
C
X
G
B
A
C
X
G
A
C
X
B G
Convert to 2-3 Tree
?: Why are two red links in a row not possible in an LLRB tree?

What’s the height of the corresponding LLRB tree?
14
D E
P
B
G
J
N
Q R
V W
A
C
F
H
I
K
M
O
S U
T
L
Q

What’s the height of the corresponding LLRB tree?
15

What’s the height of the corresponding LLRB tree?
16
D E
P
B
G
J
N
Q R
V W
A
C
F
H
I
K
M
O
S U
T
L
A
Longest path
L
P
U
S
R
Q
The total height for the corresponding LLRB is 5.

?: If the height of a 2-3 tree is H, what is the height of its corresponding LLRB tree?

Maximum Height LLRB Tree
17
D E
P
B
G
J
N
Q R
V W
A
C
F
H
I
K
M
O
S U
T
L
L
P
U
S
R
Q
Worst case when these are 3-nodes
Given a 2-3 tree of height H, the corresponding LLRB tree has height H (black) + H + 1 (red).
The total height for the corresponding LLRB is 5.

?: If the height of a 2-3 tree is H, what is the maximum possible height of its corresponding LLRB tree?

LLRB Tree Invariant 1
Every path from root to a leaf has same number of black links.
LLRB trees are therefore balanced.
18
B
A
C
X
G
B
A
C
X
G
B
A
C
X
G
C
B
X
G
A
?: Why is this true?

Improving Search Trees
Hypothesis. Get the best of both worlds: a BST with the functionality of a 2-3 tree.

Identify. 2-3 trees have a bijection or 1-1 correspondence with LLRB trees.

Plan. 2-3 tree operations like overstuffing leaves and splitting can be implemented as rotations and coloring in an LLRB tree.
19
f
o
n
p
m
d
g
b
e
We have a procedure for converting 2-3 trees to LLRB trees, but it’s not helpful on its own since it relies on a 2-3 tree implementation that we don’t have!

What’s left is converting the behaviors of each of the 2-3 tree operations such as (1) overstuffing leaf nodes, and (2) splitting nodes all the way back up to the root of the tree.

When performing LLRB tree operations, pretend it’s a 2-3 tree.
20
Preservation of the correspondence will involve tree rotations. We want our LLRB tree to function like a 2-3 tree, so let’s design it so that it behave this way!

Red Link
Black Link
Overstuffing: Inserting a New Node
Should we use a red or black link when inserting a new node?
21
Q
B
A
B
add(A)
B
A
B
add(A)
2-3 Tree
B
add(A)
A B
Our 2-3 tree adds the key to the left side since A is smaller than B.

Q1: Should we use a red or black link when inserting a new node?

Should we use a red or black link when inserting a new node?
22

Red Link
Overstuffing: Inserting a New Node
Use a red link to mimic the corresponding 2-3 tree.
23
A
B
A
B
add(A)
2-3 Tree
B
add(A)
A B

Overstuffing: Right-Side Special Case
What is the problem with inserting a red link to the right child? What should we do to fix it?
24
Q
2-3 Tree
B
add(C)
B C
B
C
B
add(C)
?
Q1: What is the problem with inserting a red link to the right child? What should we do to fix it?

Overstuffing: Right-Side Special Case
Rotate left around B.
25
A
B
C
B
add(C)
2-3 Tree
B
add(C)
B C
rotateLeft(B)
C
B

LLRB Tree
Representation Invariant Consistency
Internal methods can temporarily break invariants so long as the final state is correct.
26
B
C
B
add(C)
2-3 Tree
B
add(C)
B C
rotateLeft(B)
C
B
The number of steps it takes to maintain an LLRB tree is different from the number of steps it takes to maintain the corresponding 2-3 tree.

Splitting: Inserting to the Right Side
How do we mimic the 2-3 tree node split?
27
Q
2-3 Tree
A B
add(C)
add(C)
B
A
B
C
A
B
C
A
?
Q1: How do we mimic the 2-3 tree in this case?

Splitting: Inserting to the Right Side
Flip the color of the links touching B.
28
A
2-3 Tree
A B
add(C)
add(C)
B
A
B
C
A
B
C
A
flip(B)
B
C
A

Splitting: A Larger Example
29
Q
2-3 Tree
add(C)
add(C)
G
X
A B
flip(B)
B G
X
A
C
B
A
G
X
B
C
A
G
X
?
Q1: What should the result of flipping B’s links look like, based on the corresponding 2-3 tree?

Splitting: A Larger Example
30
A
2-3 Tree
add(C)
add(C)
G
X
A B
flip(B)
B G
X
A
C
B
A
G
X
B
C
A
G
X
B
C
A
G
X
Parent link flips too!

Splitting: Cascading Balance
31
Q
A
S
B
Z
A
S
B
Z
E
add(E)
2-3 Tree
add(E)
B
Z S
A
B S
Z
A
E
?
Q1: What rotation or color flip should we use to coerce this into a better form?

Splitting: Cascading Balance
32
A
A
S
B
Z
E
add(E)
2-3 Tree
add(E)
B
Z S
A
B S
Z
A
E
rotateRight(Z)
A
E
B
S
Z
flip(S)
A
E
B
S
Z
rotateLeft(B)
Z
E
S
B
A
We see all three cases at play.
Right link red? Rotate left.
Two left reds in a row? Rotate right.
Both children red? Flip colors.

1-1 Correspondence in Three Cases.
Right link red? Rotate left.Two left reds in a row? Rotate right.Both children red? Flip colors.
LLRB Tree Invariants
Correctness Analysis. A BST with:

Perfect black balance. Every root-to-leaf path has the same number of black links.
Left-leaning. Red links lean left.
Color invariant No node has two red links connected to it: above/below or left/right.
33
We now have a working left-leaning red-black tree!

?: How do we know that these three cases are enough to maintain the invariants?

Create: Java Implementation
private Node add(Node h, Key key, Value value) {
  if (h == null) { return new Node(key, value, RED); }

  int cmp = key.compareTo(h.key);
  if (cmp < 0)      { h.left  = add(h.left,  key, val); }
  else if (cmp > 0) { h.right = add(h.right, key, val); }
  else              { h.value = value;                  }

  if (isRed(h.right) && !isRed(h.left))      { h = rotateLeft(h);  }
  if (isRed(h.left)  &&  isRed(h.left.left)) { h = rotateRight(h); }
  if (isRed(h.left)  &&  isRed(h.right))     { flipColors(h);      }

  return h;
}
34
All new nodes have a red link to their parent
1-1 Correspondence in Three Cases.
Right link red? Rotate left.Two left reds in a row? Rotate right.Both children red? Flip colors.
Each recursive call to add can, in theory, execute all three cases but no more than that.

Evaluate: LLRB Runtime
Searching for a key is the same as a BST.
Tree height is guaranteed in Θ(log N).
Inserting a key is a recursive process.
Θ(log N) to add(E).
Θ(log N) to maintain invariants.
35
A
S
B
Z
E
add(E)
rotateRight(Z)
flip(S)
rotateLeft(B)
add(E)
rotateLeft(B)
rotateRight(Z)
flip(S)
E
Recall that add is a recursive method. Each row in the diagram is a call to add(E) on a different node in the tree. There’s a cost to recurse downwards (to find the right place to add the leaf) and then a cost as we return from each recursive frame and maintain invariants.

?: Why is the runtime to execute add(E) Θ(log N) rather than O(log N)?

Search Trees
Binary Search Trees (BST). Simple, but can be unbalanced with real-world data.

2-3 Trees. Balanced by construction: no rotations required. But the algorithm is complicated and relatively slow.

LLRB Tree. A self-balanced binary search tree that is fast and simple to implement.
36
f
o
n
p
m
d
g
b
e
Java’s TreeMap is a red-black tree. It maintains a correspondence with 2-3-4 trees though the correspondence is not 1-1. This speeds up the program by a constant factor but makes for a more complicated implementation.