HW#5 Solutions

  1. If many keys hash to similar values (same high-order bits) in an extendible hash table, the buckets corresponding to those directory pointers will fill. This may necessitate splitting (when two pointers point to the same bucket), and later rehashing (when the new buckets become too full) which will double the size of the directory. For example we may have two disk-blocks worth of keys which all hash to values beginning with 1111. In this case we need to rehash until our directory has at least 32 pointers, when perhaps many of those pointers will be pointing to empty buckets.

    However, if many similar keys are inserted, they still might hash to a uniform distribution of values if you have a good hash function, and then all buckets would be approximately evenly full.

  2.     a    b    c    d    e    f    g    h    i    j
    
        a    b    c    d    e    f    g    h    i    j
    
        a    b    c         e    f    g    h    i    j
        |
        d
    
        a    b              e    f    g    h    i    j
        |    |
        d    c
    
        a    b              e    f    g    h    i
        |    |                             |
        d    c                             j
    
        a    b              e    f    g    h    i
        |    |                             |
        d    c                             j
    
        a    b              e    f    g         i
        |    |\                              
        d    c h
               |
               j
                                 
        a    b              e    f    g         i
        |    |\                              
        d    c h
               |
               j
    
        a    b              e    f    g         i
        |\   
        d b
          |\ 
          c h
            |
            j 
    

  3.     a    b    c    d    e    f    g    h    i    j
    
        a    b    c    d    e    f    g    h    i    j
    
        a    b    c         e    f    g    h    i    j
        |
        d
    
        a    b              e    f    g    h    i    j
        |    |
        d    c
    
        a    b              e    f    g    h    i
        |    |                             |
        d    c                             j
    
        a    b              e    f    g    h    i
        |    |                             |
        d    c                             j
    
        a    b              e    f    g         i
        |    |\                              
        d    c h
               |
               j
                                 
        a    b              e    f    g         i
        |    |\                              
        d    c h
               |
               j
    
             b              e    f    g         i
            /|\                              
           a c h
           |   |
           d   j
    

  4.     a    b    c    d    e    f    g    h    i    j
    
        a    b    c    d    e    f    g    h    i    j
    
        a    b    c         e    f    g    h    i    j
        |
        d
    
        a    b              e    f    g    h    i    j
        |    |
        d    c
    
        a    b              e    f    g    h    i
        |    |                             |
        d    c                             j
    
        a    b              e    f    g    h    i
        |    |                             |
        d    c                             j
    
        a    b              e    f    g         i
        |    |\                              
        d    c h
               |
               j
                                 
        a    b              e    f    g         i
        |   /|\                              
        d  j c h
    
             b__            e    f    g         i
            /|\ \                             
           j c h a
                 |
                 d
    

  5. Up-trees support four operations: Create, Destroy, Union and Find (note: you only needed to mention the latter two).

    To search for an item in an up-tree (i.e., find an item by its key) takes O(n) time if you have no dictionary to map keys to items because you have to look through every element. However, in the (likely) case that you do have some dictionary (such as an array or hash table of node pointers, indexed by key value), such a find may only take constant time. The trick here is that the up-tree itself does not provide that fast searching time.

    This is a tricky question, but it's worth understanding. In fact, it is actually reasonable to imagine an up-tree forest data structure that doesn't bother keeping a dictionary to help find its nodes. The reason is that many implementations of disjoint set union/find simply return a pointer to any element that's added, putting the burden of find by key on the client code.

  6. Extendible hash tables support large amounts of input because they can grow if necessary to accommodate more input and because they work well even as disk I/O begins to dominate runtime (rather than CPU operations).

  7. As mentioned in class, up-trees are perfect for keeping track of disjoint sets, where each set represents an equivalence class of items. Check out the definition of evuivalence class in the disjoint set union/find notes.

  8. Binomial Queues support merging in a similar manner to the way we add binary numbers. We can hang one size-k binomial tree off another size-k binomial tree to make a size-2k binomial tree. Each hanging takes constant time, and a BQ of n items has O(log n) trees, so a merge of two trees with a total of n items will run in O(log n) time.

  9. Treaps are BSTs, so they support the dictionary operations. Also, their randomized priorities simulate randomizing the order of the input, so the expected height of the tree becomes the same as the average height, namely O(log n) for an n-node treap.

  10. Creative and Deceptive Trees (or C and D Trees for short) are imaginary dictionaries for storing wacky ideas. So, if you have no idea, you can use a C&D Tree to get ideas, such as ficticious data structure names you can use to fool your friends. Also, since they are imaginary, they don't take up any memory, and all accesses run in amortized constant time.

  11. Spose we have a forest of up-trees with n elements. Then there can be at most n distinct trees. When we perform a Union, we take two trees and combine them into one, thereby decreasing the total number of trees by one. We cannot perform a Union on just one tree. Therefore, we can perform at most n-1 Unions. Each Union takes constant time, so the worst case total cost of all possible Unions is O(n) time. The best case is constant time, and occurs if we start with only a constant number of trees (for example, if no matter how many elements we have, we know we will have at most ten up-trees to begin with).