Link

Prefix Operations and Tries Study Guide

DataIndexedCharMap. In working our way through hash tables, we explored a progression of three specialized data structures: the DataIndexedIntegerSet, the DataIndexedEnglishWordSet, and the DataIndexedStringSet. We can use ASCII representation to implement a data-indexed character map to efficiently store the 128 unique ASCII characters.

Tries. Strings can be stored by recursively mapping these character maps to other character maps. This family of data structures is called a “trie” (pronounced “try”). Instead of a binary (2-way) tree, we instead get a 128-way tree if we want to use ASCII characters in DataIndexedCharMap.

Optimizing Tries. The choice of character map can be optimized. We’ve seen several data structures that can implement the Map ADT in class, including binary search trees and hash tables. In the case of binary search trees, it’s redundant (and slightly awkward) to maintain an internal binary search tree character map within each trie node.

Ternary Search Trie. TSTs resolve the awkwardness associated with using binary search tree character maps by combining the character map with the main structure of the trie. A TST node contains three children, each of which are also TST nodes. Suppose our root TST node contains the character ‘t’.

  • Left child represents all strings that come before ‘t’ in the alphabet.
  • Middle child represents all strings that start with the letter ‘t’.
  • Right child represents all strings that come after ‘t’ in the alphabet.

Range Searching in Tries. To find all strings in a trie that match a given prefix, first search for the prefix in the trie. Then, collect all of the nodes under the given prefix. However, some optimizations are necessary to make it faster for autocomplete because, typically, only the top 10 or so results are actually relevant. To prune the tree and optimize autocomplete, keep track of the max-weight string under a given subtree and check that this max-weight value is better than any of the top 10 strings collected so far before exploring the subtree.

  1. Q5 from COS 226 08sp Final
  2. Q8 from COS 226 11au Final
  3. Q8 from COS 226 12au Final
  4. When looking for a single character string in a trie, what is the worst case time to find that string in terms of R (size of alphabet) and N (total number of strings)?
  5. True or false: The number of character compares required to construct an R-way trie is always less than or equal to the number required to construct an LLRB.
  6. True or false: The number of character compares required to construct an R-way trie is always less than or equal to the number of character accesses needed to construct a hash table.