- If many keys hash to similar values (same high-order bits) in an
extendible hash table, the buckets corresponding to those directory
pointers will fill. This may necessitate splitting (when two pointers
point to the same bucket), and later rehashing (when the new buckets
become too full) which will double the size of the directory. For example
we may have two disk-blocks worth of keys which all hash to values
beginning with 1111. In this case we need to rehash until our directory
has at least 32 pointers, when perhaps many of those pointers will be
pointing to empty buckets.
However, if many similar keys are inserted, they still might hash to
a uniform distribution of values if you have a good hash function, and
then all buckets would be approximately evenly full.
-
a b c d e f g h i j
a b c d e f g h i j
a b c e f g h i j
|
d
a b e f g h i j
| |
d c
a b e f g h i
| | |
d c j
a b e f g h i
| | |
d c j
a b e f g i
| |\
d c h
|
j
a b e f g i
| |\
d c h
|
j
a b e f g i
|\
d b
|\
c h
|
j
-
a b c d e f g h i j
a b c d e f g h i j
a b c e f g h i j
|
d
a b e f g h i j
| |
d c
a b e f g h i
| | |
d c j
a b e f g h i
| | |
d c j
a b e f g i
| |\
d c h
|
j
a b e f g i
| |\
d c h
|
j
b e f g i
/|\
a c h
| |
d j
-
a b c d e f g h i j
a b c d e f g h i j
a b c e f g h i j
|
d
a b e f g h i j
| |
d c
a b e f g h i
| | |
d c j
a b e f g h i
| | |
d c j
a b e f g i
| |\
d c h
|
j
a b e f g i
| /|\
d j c h
b__ e f g i
/|\ \
j c h a
|
d
- Up-trees support four operations: Create, Destroy, Union and Find
(note: you only needed to mention the latter two).
To search for an item in an up-tree (i.e., find an item by its key)
takes O(n) time if you have no dictionary to map keys to items
because you have to look through every element. However, in the
(likely) case that you do have some dictionary (such as
an array or hash table of node pointers, indexed by key value), such a
find may only take constant time. The trick here is that the up-tree
itself does not provide that fast searching time.
This is a tricky question, but it's worth
understanding. In fact, it is actually reasonable to imagine an
up-tree forest data structure that doesn't bother keeping a dictionary
to help find its nodes. The reason is that many implementations of
disjoint set union/find simply return a pointer to any element that's
added, putting the burden of find by key on the client code.
- Extendible hash tables support large amounts of input because they can
grow if necessary to accommodate more input and because they work well
even as disk I/O begins to dominate runtime (rather than CPU
operations).
- As mentioned in class, up-trees are perfect for keeping track of
disjoint sets, where each set represents an equivalence class of
items. Check out the definition of evuivalence class in the disjoint
set union/find notes.
- Binomial Queues support merging in a similar manner to the way we
add binary numbers. We can hang one size-k binomial tree off another
size-k binomial tree to make a size-2k binomial tree. Each hanging
takes constant time, and a BQ of n items has O(log n) trees, so a
merge of two trees with a total of n items will run in O(log n) time.
- Treaps are BSTs, so they support the dictionary operations. Also,
their randomized priorities simulate randomizing the order of the input,
so the expected height of the tree becomes the same as the average height,
namely O(log n) for an n-node treap.
- Creative and Deceptive Trees (or C and D Trees for short) are
imaginary dictionaries for storing wacky ideas. So, if you have no idea,
you can use a C&D Tree to get ideas, such as ficticious data structure
names you can use to fool your friends. Also, since they are imaginary,
they don't take up any memory, and all accesses run in amortized constant
time.
- Spose we have a forest of up-trees with n elements. Then there can be
at most n distinct trees. When we perform a Union, we take two trees and
combine them into one, thereby decreasing the total number of trees by
one. We cannot perform a Union on just one tree. Therefore, we can
perform at most n-1 Unions. Each Union takes constant time, so the worst
case total cost of all possible Unions is O(n) time. The best case is
constant time, and occurs if we start with only a constant number of trees
(for example, if no matter how many elements we have, we know we will have
at most ten up-trees to begin with).