# 2-3 Trees

## Tree height

In this lesson, we’ll address the time complexity drawback of (unbalanced) binary search trees. In a binary search tree, elements are inserted as new leaf nodes. The shape of the binary search tree depends entirely on the insertion order.

### Question 1

Which of the following statements about the height of a binary search tree are true with respect to N, the size of the tree?

• The worst-case height of a BST is in Θ(N).
• The height of a BST is in O(N).
• The height of a BST is in O(N2).
Explanation

The most specific claim is, “The worst-case height of a BST is in Θ(N).” Since this claim is true, the other two claims must also be true.

## Proposing invariants

Goal
Design a binary tree structure that ensures log height for any insertion order.

Data structure implementers maintain invariants to narrow down exactly where an element can be found in the data structure. Binary search trees maintain the binary search invariant to determine the relative location of elements in the search tree. If we can design an invariant that will determine the shape of the overall tree, then we might be able to guarantee log height for any insertion order.

Let’s propose a new invariant by evaluating how we end up with worst-case height trees.

What is the cause of worst-case height trees?
Worst-case height trees are spindly trees where each node has only one child. In contrast, bushy trees have two children per node.
Design a new invariant that addresses the problem.
Fullness invariant: each node must have either 0 or 2 children.
Draw a worst-case binary tree with this invariant. The order of growth of the height of this tree is in Θ(N). We still have a “spindly” structure not because there are no nodes with only 1 child, but because the tree is unbalanced. In practice, we can’t actually come up with an insertion order that results in this tree without also inserting nodes with one child. But the bigger problem is that this invariant can still result in linear tree height.

## 2-3 tree invariants

Let’s consider a different hypothesis: that unbalanced growth leads to worst-case height trees. Elements are always inserted as new leaf nodes. Sometimes, these new leaves result in unbalanced growth.

If new leaf nodes are responsible for height increase, then suppose we never create new leaves. Instead, when inserting an item, overstuff an existing leaf to avoid creating new leaves. If we never create new leaves, the tree can never get unbalanced! Give a worst-case insertion order for overstuffed leaves.

If we insert items in ascending order, then the leaf will get more and more full. We trade searching N children down the side of a spindly tree with searching through N items in an overstuffed leaf.

We can avoid overfull leaves by setting a limit on the number of items in each node. If a node has more than the specified number of items, promote an item by pushing it up to its parent node. This needs to be done in a careful manner to maintain the search tree invariant by promoting a middle item and splitting the node into two parts: left and right.

A 2-3 search tree is a balanced search tree data structure where all leaves are the same depth from the root and whose nodes are either:

• Empty (null).
• 2-nodes that contain one key and two children: a left child containing smaller keys and a right child containing larger keys.
• 3-nodes that contain two keys and three children: a left child containing smaller keys, a middle child containing keys between the node’s two keys, and a right child containing larger keys.

## 2-3 tree insertion

The height of a 2-3 tree increases only when the root splits. A root split contributes height equally to all of its children. Rather than grow from the leaves like binary search trees, 2-3 trees grow from the root!

Try inserting the numbers into the B-tree visualization. (2-3 trees are a special case of B-trees where max degree = 3.)

### Question 1

Suppose the numbers 8, 2, 4, 5, 7, 9 are inserted into a 2-3 tree in the given order. Reorder the promotions so that they occur right after insertion of an element that “overstuffs” a node.

• Insert 8
• Insert 2
• Insert 4
• Insert 5
• Insert 7
• Insert 9
• First promotion
• Second promotion

## 2-3 tree height

Try inserting the numbers into the B-tree visualization. (2-3 trees are a special case of B-trees where max degree = 3.)

### Question 1

Give an insertion order for the keys 1, 2, 3, 4, 5, 6, 7 that results in a max-height 2-3 tree.

### Question 2

Give an insertion order for the keys 1, 2, 3, 4, 5, 6, 7 that results in a min-height 2-3 tree.

### Question 3

What is the minimum minimum number of items that we can insert into this 2-3 tree and cause the height to increase? Explanation

Insert 10, 26, 27 (promotions bolded).

### Question 4

What is the maximum number of items that can be added without increasing this tree’s height? Explanation

Finding an insertion order is tricky, but here’s one: 5, 24, 26, 14, 16, 15, 17, 21, 28, 29, 23, 27, 30 (promotions bolded). ## 2-3 tree analysis

Depending on the number of 2-nodes vs. 3-nodes in a 2-3 tree, the number of elements that can be stored in a 2-3 tree of given height may vary greatly. For example, a 2-3 tree of height 2 can contain anywhere between 7 elements (all 2-nodes) and 26 elements (all 3-nodes).

### Question 1

Identify the best-case order of growth for the height of a 2-3 tree containing N elements.

• Θ(1)
• Θ(log N)
• Θ(N)
Explanation

All of the nodes are 3-nodes (as “stuffed” as possible), so the height of the tree is about log3N.

### Question 2

Identify the worst-case order of growth for the height of a 2-3 tree containing N elements.

• Θ(1)
• Θ(log N)
• Θ(N)
Explanation

All of the nodes are 2-nodes (like a perfectly-balanced binary search tree), so the height of the tree is about log2N.