Multi-Dimensional Data
Applying Iterative Refinement in the Algorithm Design Process to handle multi-dimensional data.
Kevin Lin, with thanks to many others.
1
Ask questions anonymously on Piazza. Look for the pinned Lecture Questions thread.
Feedback from the Reading Quiz
2
General Data Structures
Data structures allow us to avoid looking at all of the data all of the time.
Binary search tree. Make a decision to ignore data based on key comparison.
Binary heap. Optimize for access to the smallest or largest items.
Hash table. Make a decision to ignore data based on hash code, bucket index.
Invariants help us to ensure consistency.
3
4
2
6
1
3
5
7
6
5
8
4
7
1
0
1
2
3
4
Specialized Data Structures
Data-Indexed Array. Keys must be small.
Tries. Keys must be subdivisible (strings).
Today: multi-dimensional keys.
4
s
a
m
d
p
e
a
w
l
s
F
守门呗
守门员
守门呙
39,312,024,869,367
39,312,024,869,368
39,312,024,869,369
...
F
...
T
2-d Linear Range Search
Linear range search: a simple baseline.
2-d Range Search: O(N). Scan through all the keys and collect matching results.
Insert a 2-d key: O(1). Put key anywhere.
Because keys can be anywhere, insertion is fast but search is unacceptably slow.
Goal: a logarithmic time solution.
5
A
(-1, -1)
(2, 2)
B
(0, 1)
C
D
(1, 0)
E
(-2, -2)
F
(-3, 2.5)
Data structures optimize for certain operations on data by coming with organizational schemes that allow us to ignore large portions of the data. These organizational schemes are implemented with algorithms that respect the data structure invariants.
Uniform Partitioning
6
Uniform Partitioning
Spatial partitioning. How to divide space into non-overlapping subspaces.
Uniform partitioning. Partition space into uniform rectangular buckets (“bins”).
Right example: 4x4 grid of such buckets.
7
A
(-1, -1)
(2, 2)
B
(0, 1)
C
D
(1, 0)
E
(-2, -2)
F
(-3, 2.5)
?: How much of an improvement is this over linear range search for 2-d range search queries? For nearest neighbor queries?
Uniform Partitioning
Spatial partitioning. Divide space into non-overlapping subspaces.
Uniform partitioning. Partition space into uniform rectangular buckets (“bins”).
How many bins do we need to scan to collect all points in the green rectangle?
8
A
(-1, -1)
(2, 2)
B
(0, 1)
C
D
(1, 0)
E
(-2, -2)
F
(-3, 2.5)
Q
Q1: How many bins do we need to scan to collect all the points in the red rectangle?
How many bins do we need to scan to collect all the points in the green rectangle?
9
Uniform Partitioning
Spatial partitioning. Divide space into non-overlapping subspaces.
Uniform partitioning. Partition space into uniform rectangular buckets (“bins”).
What is the runtime for nearest assuming points are evenly spread out?
10
Q
Q1: What is the runtime for nearest assuming points are evenly spread out?
What is the runtime for nearest assuming points are evenly spread out?
11
Uniform Partitioning
Spatial partitioning. Divide space into non-overlapping subspaces.
Uniform partitioning. Partition space into uniform rectangular buckets (“bins”).
What is the runtime for nearest assuming points are evenly spread out?
Still Θ(N). On average, the runtime will be 16 times faster than without the spatial partitioning, but N/16 is still Θ(N).
12
A
Recursive Partitioning
13
x-coordinate BST
Suppose we put points into a BST map ordered by x-coordinate.
14
A
(-1, -1)
(2, 2)
B
(0, 1)
C
D
(1, 0)
E
(-2, -2)
F
(-3, 2.5)
A (-1, -1)
B (2, 2)
C (0, 1)
D (1, 0)
E (-2, -2)
F (-3, 2.5)
More general theme inspired by binary search trees vs. ordered linked nodes: recursive subdivision leads to logarithmic behaviors, while uniform subdivision leads to linear behaviors.
x-coordinate BST
Prune right subtree in, “What are all the points with x-coordinate less than -1.5?”
15
A
(-1, -1)
(2, 2)
B
(0, 1)
C
D
(1, 0)
E
(-2, -2)
F
(-3, 2.5)
A (-1, -1)
B (2, 2)
C (0, 1)
D (1, 0)
E (-2, -2)
F (-3, 2.5)
Pruning
y-coordinate BST
But in a y-coordinate BST, we can’t prune anything!
16
A
(-1, -1)
(2, 2)
B
(0, 1)
C
D
(1, 0)
E
(-2, -2)
F
(-3, 2.5)
A (-1, -1)
B (2, 2)
C (0, 1)
D (1, 0)
E (-2, -2)
F (-3, 2.5)
Recursive Partitioning
1-dimensional data (BST)
Keys are ordered on a line.
Recursive decision: left or right.
2-dimensional data
Keys are located on a plane.
Recursive decision: left, right + up, down.
17
A
A
up
down
left
right
left
right
up
down
But we want to recursively subdivide, so our partition also needs to be a rectangular plane.
Recursive Partitioning
1-dimensional data (BST)
Keys are ordered on a line.
Recursive decision: left or right.
2-dimensional data (Quadtree)
Keys are located on a plane.
Recursive decision: NE, SE, SW, or NW.
18
NW
SW
NE
SE
A
A
left
right
?: What does a quadtree look like? Each node has how many children?
Quadtree
5 objects in 2D space.
19
A
NW
NE
SE
SW
B
C
SW
SE
D
E
A
(-1, -1)
(2, 2)
B
(0, 1)
C
D
(1, 0)
E
(-2, -2)
Demo
?: Does insertion order affect the balance of a quadtree?
Quadtrees as Spatial Partitioning
Quadtrees produce recursive, hierarchical partitionings. Each point owns 4 subspaces.
20
A
B
D
E
A
B
C
D
E
Uniform Partitioning
Quadtree
C
Quadtree Range Searching
We can prune unnecessary subspaces!
21
A
NW
NE
SE
SW
B
C
SW
SE
D
E
A
(-1, -1)
(2, 2)
B
(0, 1)
C
D
(1, 0)
E
(-2, -2)
Demo
22
3-dimensional Data (Octree)
Octree (WhiteTimberwolf/Wikimedia)
K-d Tree
23
In this course, we will only study the special case of k = 2, or 2-d trees.
Idea from Tries.
Compare on first char, then on second char…
24
Recursive Partitioning
25
2-dimensional data (Quadtree)
Keys are located on a plane.
Recursive decision: NE, SE, SW, or NW.
NW
SW
NE
SE
A
2-dimensional data (2-d tree)
Recursive decision 1: left or right.
Recursive decision 2: up or down.
A
up
down
left
right
2-d Tree
Idea. Root node partitions entire space left and right (by x-coordinate).
All depth 1 nodes partition subspace into up and down (by y-coordinate).
All depth 2 nodes partition subspace into left and right (by x-coordinate).
…
Each point owns 2 subspaces.
The subspace above D is infinitely large.
26
A
(2, 3)
B
(4, 2)
(4, 5)
C
D
(3, 3)
E
(1, 5)
F
(4, 4)
Demo
Root
?: Does insertion order affect the balance of a k-d tree?
2-d Tree Insertion
Where would G go in the 2-d tree?
27
A
(2, 3)
B
(4, 2)
(4, 5)
C
D
(3, 3)
E
(1, 5)
F
(4, 4)
Q
G
(5, 3)
L
R
D
U
L
R
B (4, 2)
A (2, 3)
C (4, 5)
D (3, 3)
D
U
E (1, 5)
D
U
F (4, 4)
D
U
Q1: Where would G go in the 2-d tree?
Where would point G go in the 2-d tree?
28
2-d Tree Nearest Neighbors
Optimization. Do not explore subspaces that can’t possibly have a better answer than the current best.
Find the nearest point to (0, 7).
29
A
(2, 3)
B
(4, 2)
(4, 5)
C
D
(3, 3)
E
(1, 5)
F
(4, 4)
Demo
There’s a more advanced and subtle pruning rule that we’ll see in the homework.
Summary
Range Searching. What are all the objects inside this (rectangular) subspace?
Nearest. What is the closest object to a specific point? k-nearest often in machine learning.
Spatial partitioning. How to divide space into non-overlapping subspaces.
Uniform Partitioning. Analogous to hashing with a fixed number of M buckets.
Quadtree. Generalization of BST where each point owns 4 subspaces.
K-d Tree. Generalization of BST where each point owns 2 subspaces. Generalizes to higher dimensions: dimension ownership cycles with each level of depth in the tree.
Spatial partitioning allows for pruning of the search space.
30