# AVL Trees

## Table of contents

- Balance factor
- AVL trees
- AVL tree invariants
- AVL tree cases
- Line case
- Kink case
- Rebalancing
- AVL vs LLRB tree
- AVL tree insertion
- AVL tree analysis

## Balance factor

In our study of balanced search trees, we considered the hypothesis that unbalanced growth leads to worst-case height trees. With 2-3 trees, we addressed this hypothesis by never creating new leaf nodes and instead growing the tree by splitting nodes in a way that ensured relatively-balanced growth. With left-leaning red-black trees, we took the concept of a 2-3 tree and defined corresponding rotations and color flips to represent any 2-3 tree as a binary search tree.

In this lesson, we’ll examine AVL trees, which are another type of balanced search tree data structure that can be used to implement sets and maps. While LLRB trees derived their properties from 2-3 trees, AVL trees directly address unbalanced growth by ensuring that every node is **balanced**.

- Balance factor
- For a given node, compute the height of right subtree minus the height of the left subtree.

## AVL trees

## AVL tree invariants

An AVL tree is a binary search tree that respects the AVL tree invariant: For every node, the height of its left and right subtrees may only differ by at most 1 (balance factor is either -1, 0, or 1).

### Question 1

Which of the following are valid AVL trees?

## Explanation

- Option A violates the BST invariant between the values 4 and 5.
- Option B violates the AVL tree balance factor at the node with value 3.
- Option C meets both the BST invariant and the AVL tree invariant.

## AVL tree cases

AVL tree insertion is similar to the procedure that we saw with LLRB trees: first perform a regular BST insertion and then rebalance the tree using tree rotation operations wherever the balance factor is violated. After we insert a node, the balance factors change only for the nodes on the insertion path. Just like in LLRB trees, when **returning** from each recursive call to `add`

, if the left and right subtree heights differ by more than 1, the AVL tree will perform tree rotations to restore balance, exactly like in an LLRB tree.

There are 4 different cases to consider for AVL tree insertion: 2 “line” cases and 2 “kink” cases. Try inserting values into the AVL tree visualization to see the rotations in action.

- Line cases
- Kink cases

In order to check the balance factor, the AVL tree must know each subtree’s height. Rather than recompute the subtree height every time, the code for an AVL tree nodes maintains the subtree height as a field. This is the same ideas storing the red/black node color in LLRB trees.

## Line case

## Kink case

## Rebalancing

## AVL vs LLRB tree

Let’s compare and contrast AVL trees with left-leaning red-black trees.

### Question 1

True/False: Consider this valid AVL tree. Is it possible to color some edges red so that it is a valid LLRB tree?

## Explanation

In an LLRB tree, the value 2 would have been inserted as a red, right leaf node. But this would require a `rotateLeft(1)`

so that we maintain the left-leaning property.

### Question 2

We know that worst-case search trees are as unbalanced as possible between the left and right subtrees. Are worst-case AVL trees more unbalanced than worst-case LLRB trees? Explain in terms of *H*, the height of the AVL or LLRB tree.

- Worst-case AVL trees are more unbalanced than worst-case LLRB trees.
- Worst-case LLRB trees are more unbalanced than worst-case AVL trees.
- Worst-case AVL trees are about as unbalanced as worst-case LLRB trees.

## Explanation

Consider the worst-case subtree heights at the overall root:

- In an AVL tree, one subtree has height
*H*− 1 and the other subtree has height*H*− 2. - In an LLRB tree, the worst case occurs when the corresponding 2-3 tree has as many left-child 3-nodes as possibles. The left subtree of the overall root has height
*H*− 1 and the right subtree has height approximately*H*/2 (since it lacks red edges).

## AVL tree insertion

- Line cases
- Kink cases

### Question 1

Which tree represents the result of inserting the value 0?

## Explanation

Option A represents the result of performing just the BST insertion step. The subtree rooted at the node with value -2 violates the AVL tree invariant. Since the insertion path is a line, `rotateLeft(-2)`

.

### Question 2

Suppose we followed the BST insertion procedure to add the value 4 to this AVL tree. Which line/kink violates the AVL tree invariant, where the first node in the line/kink represents the node where the balance factor differs by more than 1?

**8–2–1****8–2–6****2–1–4****2–6–3****2–6–7****6–3–4****6–7–4**- None of the above.

## Explanation

The BST insertion procedure adds the value 4 as the right child of 3, which eliminates the options that imply 4 would be inserted elsewhere. Compute the balance factors of each node on the insertion path.

### Question 3

Which tree rotation case is needed to fix the AVL tree invariant when inserting the value 4 into the above tree?

- Left rotation (line case)
- Right rotation (line case)
- Right/left rotation (kink case)
- Left/right rotation (kink case)

### Question 4

Give **2 sets of answers** to the following questions: What integer, if inserted into the tree, would initially break the AVL invariant? Which rotations would need to be applied afterwards?

## AVL tree analysis

Earlier, we used 1-1 correspondence with 2-3 trees to show that the height of an LLRB tree is guaranteed to be in *Θ*(log *N*) where *N* is the size of the tree. Let’s show that the height of an AVL tree is also in *O*(log *N*) using recurrence relations, unrolling, and asymptotic notation definitions.

This problem is not part of the learning objectives. It’s here to provide completeness to our study of balanced search trees since we showed earlier how 2-3 trees and LLRB trees have guaranteed logarithmic height.

Let *N* = *T*(*H*) represent the **minimum** number of elements stored in an AVL tree of height *H*. If we can show that an AVL tree must contain at least an exponential number of elements with respect to its height, then its height is logarithmic with respect to the size of the tree.

### Question 1

If the overall height of an AVL tree is *H*, one of its left/right children must be of height *H* − 1 for the 1 additional edge to the overall root. Which of the following expressions could represent the height of its other child?

*H**H*− 1*H*− 2*H*− 3

### Question 2

Give a recurrence relation for *T*(*H*) that represents the most unbalanced AVL tree of height *H* in terms of its left and right subtrees: *T*(*H*) = *T*(*H* − 1) + *T*(?) + 1.

### Question 3

Some of the following relations are true, and some of them are false. Among the relations that are true, identify the one relation that best addresses the central hypothesis: that an AVL tree must contain (at least) an exponential number of elements with respect to its height.

Note that we chose to drop the + 1 to make the next step easier.

*T*(*H*) ≥ 2*T*(*H*− 1)*T*(*H*) ≥ 2*T*(*H*− 2)*T*(*H*) ≤ 2*T*(*H*− 1)*T*(*H*) ≤ 2*T*(*H*− 2)

### Question 4

Unroll the recurrence relation to show that *H* ∈ *O*(log *N*) for the most unbalanced AVL tree.

Instead of *T*(*H*), we typically use *N* to represent size. Note how we defined above that *N* = *T*(*H*).