Link

Multi-Dimensional Data

Complete the Reading Quiz by noon before lecture.

In the Autocomplete homework, we implemented the range search operation on strings to find all the strings that started with a given prefix. Autocomplete is an example of a 1-dimensional range search: because strings can be lexicographically-ordered (dictionary order), we can imagine all of the strings as points on a line. 1-d range search can be efficiently implemented on arrays by running two binary searches: one to find the first matching key, and one to find the last matching key.

Suppose that keys are stored in a sorted array. What is the running time for range search as a function of N (the number of keys) and M (the number of matching keys)?

O(M + log N). We need to run two binary searches to get the first index and the last index, each taking O(log N) time. Collecting the M matching keys can be done in linear time by copying all of the items from the first index to the last index in the sorted array.

What is the runtime for inserting a new key into the sorted array?

O(N). Even if we binary search to the correct insertion point in O(log N) time, we still need to shift all of the following items over by one index to make space for the new key.

While sorted arrays are fast for range search queries, they are slow at inserting new keys. One way we can improve insertion runtime is by switching to a binary search tree. 1-d range search on a binary search tree is similar in principle to range searching in an array but combines searching for keys and collecting keys.

  1. Recursively find all keys in left subtree, if any could fall in the range.
  2. Check if the key in the current node matches. If so, add it to the result.
  3. Recursively find all keys in right subtree, if any could fall in the range.

If we use a red-black tree, this algorithm runs in O(M + log N) time, just like with a sorted array. However, the real improvement comes from improved insertion time which is now just O(log N).

2-d Data

Range search is an important problem in the 2-dimensional case as well. In HuskyMaps, for example, the user can double-tap on the map to start a navigation route and double-tap to end a navigation route. They can even tap on places that aren’t even roads. Somehow, our app is able to find the nearest roadway to start navigation.

This problem, known as nearest neighbor search, is closely related to 2-d range searching. As with 1-d range search and Autocomplete, we can implement a slow-but-correct algorithm: scan through all of the possible roadways and keep track of the nearest one to the target. In lecture, we’ll explore different algorithms for optimizing 2-d range search.

Can we use hashing to solve the nearest neighbor search problem?

There are two problems with using hashing.

First, hash tables typically implement the Set (or Map) ADT. We can search for an item in the set and the hash table can tell us whether or not that item is in the set. However, it can’t find the item nearest to it in the set!

Second, most hash functions (including the ones autogenerated by IntelliJ) are designed to distribute items equally without regard for their similarity to other items. Simply searching for a 2-d point that isn’t already contained in a hash table won’t work as the hash table will report that the point isn’t in the table. It is possible to make progress on this problem with locality-sensitive hashing.


Reading Quiz