Balanced Search Trees

2-3 trees and left-leaning red-black trees.

2-3 Trees
Left-Leaning Red-Black Trees

2-3 Trees

Identify element promotions during the 2-3 tree insertion process.
Identify an insertion order that will increase the height of a 2-3 tree.
Analyze the best-case and worst-case height of a 2-3 tree.

Binary search trees aimed to address the linear-time worst case for adding or removing elements from a sorted array set. Yet we now know that binary search trees not only fail to improve on this worst case runtime, but can also degrade performance on methods like contains that were much faster when we performed a binary search on a sorted array.

To realize the promise of a more efficient tree data structure, we need stronger invariants that ensure the tree never becomes unbalanced. The inventors of the 2-3 tree hypothesized that unbalanced growth in a binary search tree occurs because adding an element requires creating a new leaf node that unevenly increases the height of the tree. For example, when elements are added in ascending order to a binary search tree, the overall height of the tree increases because the height of only the right child increases.

Given that creating new leaves can lead to unbalanced growth, 2-3 trees avoid creating new leaf nodes entirely. Instead, the nodes of a 2-3 tree can expand to fit more than one key (element) so new elements can be added by fitting them into existing nodes rather than creating new leaves. If new leaf nodes are never created, the tree can never become unbalanced!

The 2-3 search tree (often shortened to just “2-3 tree”) is a self-balancing search tree designed to ensure a logarithmic-height tree no matter the order that elements are added to the data structure. Whereas each node in a binary search tree can only contain a single key per node, the nodes of a 2-3 tree can be either:

2-node: A node that contains exactly 1 key and 2 non-null children.
3-node: A node that contains exactly 2 keys and 3 non-null children.

If we allowed 4-nodes, how many keys and non-null children would they have?

A 2-3 tree doesn’t contain 4-nodes. But if it did, it could have exactly 3 keys and 4 non-null children. The number of non-null children in each node is always one greater than the number of keys because the keys serve as dividers in the search process. For example, in binary search, the middle element exactly splits the left part from the right part. If we have 2 keys, then we have 3 parts. If we have 3 keys, then we have 4 parts.

This definition does not allow nodes that have just 1 non-null child. A 2-3 tree ensures that all nodes have either 2 or 3 non-null children so that the height will always be between about log₂ N and log₃ N. Furthermore, because height is added evenly to all children, all leaf nodes in a 2-3 tree are the same distance from the overall root.

Open the Algorithm Visualizations module to visualize B-trees with max degree = 3. Insert words or numbers and predict how the data structure will change.
2-3 Tree Visualization

Left-Leaning Red-Black Trees

RedBlackBST.java

Given a 2-3 tree, identify its corresponding LLRB tree (and vice-versa).
Apply rotations and color flips for a single LLRB tree insertion.
Using 1-1 correspondence, give the LLRB tree for a series of insertions.

Although 2-3 trees addressed the worst-case runtime problem with binary search trees by overstuffing and splitting nodes, they are rarely used in the real world. Overstuffing and splitting nodes involves constructing many 2-nodes and 3-nodes, which is not only complicated to design and implement, but also incurs a significant runtime cost since new nodes are constantly created or removed from a 2-3 tree. While this doesn’t affect the asymptotic analysis, it is often noticeable in experimental analysis.

In Java, TreeSet is the standard implementation of a set that uses a self-balancing binary search tree data structure called a red-black tree. We were careful to note that 2-3 trees were not binary search trees because 3-nodes (nodes with 2 keys and 3 non-null children) can’t be found in a binary search tree. The difference with red-black trees is that they are valid binary search trees. Red-black trees take the idea of overstuffing and splitting in a 2-3 tree and map them back to binary search tree operations. This data structure has all the asymptotic runtime guarantees of a 2-3 tree without the experimental runtime drawback.

Rotation

Before we introduce red-black trees formally, we’ll first need a way to reorganize elements in a binary search tree. Depending on the order elements are inserted into the tree, there are many different ways to construct a binary search tree containing the same set of elements. But insertion order is not the only way to build different configurations of the same set of elements. We can change the structure of a binary tree through rotation.

Each rotation makes a local adjustment to the tree. Sometimes, this local adjustment increases the height of the tree; other times, it decreases the height of the tree; and in yet other times the overall height of the tree might not change at all. However, rotations always respect the binary search tree invariant. Elements that are ordered between B and D in the example below stay ordered after a rotation.

Rotations respect the binary search tree invariant

Given a particular node in a tree, we can either rotate it to the left or to the right by making it the child of one of its parents. In these definitions, G and P correspond to their labels in the tree shown below.

rotateLeft(G): Let x be the right child of G. Make G the new left child of x. We can think of this as temporarily merging G and P, then sending G down and left.
rotateRight(P): Let x be the left child of P. Make P the new right child of x. We can think of this as temporarily merging G and P, then sending P down and right.

These two rotations are inverse operations: one operation undoes the other.

One-to-one correspondence and isomorphism

Our goal is to develop a self-balancing binary search tree using the self-balancing property of 2-3 trees. Certain 2-3 trees are already valid binary search trees. 2-3 trees that only contain 2-nodes are already valid binary search trees. How might we represent 3-nodes in a binary search tree?

Using an idea from rotation, we can separate a 3-node with the keys B and D into two nodes exactly as shown during the node rotation process. A left-leaning binary search tree is a binary search tree representation of a 2-3 tree where 3-nodes are represented as two binary tree nodes connected with a left-leaning “glue” edge.

Prefer the left-leaning orientation for a BST representation of a 2-3 tree

The preference for left-leaning is historical and cultural: perhaps if the data structure inventors preferred how right-leaning looked, we might instead be learning about right-leaning binary search trees instead. Either approach is equally good as long as we’re consistent in our decision.

We now have a complete binary search tree representation for any 2-3 tree. The steps for converting a 2-3 tree are as follows. For each node:

If the node is a 2-node, then leave it the same since it’s already a valid binary tree node.
Otherwise, the node is a 3-node, so represent it with 2 binary tree nodes with a left-leaning “glue” edge.

Converting from 2-3 trees to left-leaning BSTs is not too bad. But the opposite is not so easy! In the slides above, it’s hard to tell that d and f are the two nodes that are glued together in the 2-3 tree. This is even harder to do in code, so today’s topic of left-leaning red-black trees introduces a little bit of extra information in the form of a colorful “glue” edge.

The left-leaning red-black (LLRB) tree data structure is exactly the same as left-leaning BSTs except “glue” edges connecting 3-nodes (in the corresponding 2-3 tree) are colored red. Red edges help us immediately tell which nodes are part of a 3-node in the corresponding 2-3 tree.

One-to-one correspondence (bijection): A mapping between two types of elements where elements of each type are paired with exactly one other element of the other type. One-to-one correspondence between 2-3 trees and LLRB trees implies that every 2-3 tree has a unique corresponding LLRB tree associated with it, and vice versa.
Algorithmic isomorphism: A one-to-one correspondence between two types of data structures or algorithms that also preserves structure. Isomorphism between 2-3 trees and LLRB trees implies that a change in a 2-3 tree produces a proportional change in the isomorphic LLRB tree, and vice versa.

LLRB tree invariants follow entirely from isomorphism with 2-3 trees.

Red edges lean left because that’s the convention we chose to represent 3-nodes.
No node has two red edges connected to it because 2-3 trees only allow 2-nodes and 3-nodes.
Every root-to-null path has the same number of black edges because all 2-3 tree leaf nodes are the same depth from the root.

We’ll often use “corresponding” and “isomorphic” interchangeably, but isomorphism is a stronger and more descriptive term because it implies the structure-preserving property. Isomorphism allows us to switch between thinking about a 2-3 tree as an LLRB tree or vice versa at any point in time.

What would a 2-3 tree do?

Isomorphism enables a powerful way of thinking about LLRB tree operations. In any situation, we can always ask: What would a 2-3 tree do?

The following slides visualize the procedure for adding several elements to a LLRB tree. Nodes are recursively added to the LLRB tree as new leaf nodes just like in a binary search tree. After reaching the recursive base case and creating the new leaf node, before returning, the program will perform rotations and color flips to maintain LLRB tree invariants.

LLRB tree balance is maintained using only 3 lines of code. But it’s also possible to reason about the data structure by visualizing the corresponding 2-3 tree: try simulating the sequence of insertions from the slides below in the 2-3 Tree Visualization and compare the results.