Multi-Dimensional Data Study Guide
Multi-dimensional data. Two important operations on multi-dimensional data include range searching and nearest neighbors queries. Instead of working with data that can be sorted on a line, our data can be located anywhere in a 2-d, 3-d, etc. space.
Uniform Partitioning. Uniform partitioning is analogous to hashing in 2-d space with a fixed number of buckets (such as 4x4). The more buckets, the better the runtime (typically). With uniform partitioning, our number of buckets is fixed to some constant, so as N grows the runtime for range searching or nearest neighbors queries are still O(N).
Quadtrees. Quadtrees are a generalization of binary search trees to 4 children per node to support 2-dimensional data. In a binary search tree, each left-right subdivision leads to another left-right subdivision. In a quadtree, instead of just splitting left-right, we split NE-SE-SW-NW compass directions. Each split leads to another NE-SE-SW-NW split.
K-d Trees. Another way to generalize binary search trees is to think of multi-dimensional data as a string. Like a trie, each level of the tree splits on a different dimension, starting with the first “character” (dimension), then the second “character” (dimension), and so forth. Cycle back to the first “character” (dimension) after exhausting all the dimensions in a datapoint. For example, at the root, everything to the left has an x-value less than the root x-value and everything to the right has a x-value greater than the root x-value. Then, on the next level, every item to the left of a node has a y-value less than the node’s y-value and everything to the right has a y-value greater than the node’s y-value.