Final Exam Practice Problem Solutions

1a. Normalizing by multiplying by 20 we have the following steps in the Huffman optimization algorithm.

a b c d e f g h
1 2 1 3 6 2 3 2

(a,c) b d e f g h
2 2 3 6 2 3 2

((a,c),b) d e f g h
4 3 6 2 3 2

((a,c),b) d e (f,h) g
4 3 6 4 3

((a,c),b) (d,g) e (f,h)
4 6 6 4

(((a,c),b),(f,h)) (d,g) e
8 6 6

(((a,c),b),(f,h)) ((d,g),e)
8 12

((((a,c),b),(f,h)),((d,g),e))
20

In this notation (x,y) denotes a tree with left subtree x and right subtree y ,
                    ie,
                            /\
                           / \
                          x   y
Left branch of a node is labelled '0' and right branch is labelled '1'.

1b. Depth of a leaf node is the number of bits in the encoding of the corresponding symbol.

    Symbol   Depth
      a       4
      b       3
      c       4
      d       3
      e       2
      f       3
      g       3
      h       3

Average bits per symbol
= (1/20)*(4*1 + 3*2 + 4*1 + 3*3 + 2*6 + 3*2 + 3*3 + 3*2) = 56/20 = 2.8 bits per symbol

1c. Codes are: a: 0000, b: 001: c: 0001, d: 100, e: 11, f: 010, g: 101, h: 011,
Therefore the code for efdgh is 11010100101011

2. The final dictionary is :

                0    a
                1    b
                2    c
                3    ab
                4    bc
                5    cc
                6    cca
                7    aba

The decoded string is : abcccababa (Check the answer by running LZW by hand)

3. Edges are considered for inclusion in MST in the following order. A "yes" mark indicates that the edge did not form a cycle, and was hence included. A "no" means that the edge was rejected.

        Edge     Weight   Included
        (a,d) :    1        yes
        (d,g) :    1        yes
        (b,d) :    2        yes
        (c,f) :    2        yes
        (a,b) :    2        no
        (c,d) :    3        yes
        (d,f) :    3        no
        (e,g) :    4        yes
        (a,c) :    4        no
        (e,d) :    5        no
        (f,g) :    5        no
        (b,e) :    5        no

An MST is formed by the following edges : {(a,d), (d,g), (b,d), (c,f), (c,d), (e,g)}

4. Clearly the max clique size is not more than n. And the minimum clique size is not less than 1. We will use the fact that if there is a clique of size k then there is also a clique of every smaller size. Now, suppose we know that the max clique size is between two bounds, "min" and "max". Suppose "mid" is defined as (max + min)/2. We call algorithm A on G with k = mid. If A returns "yes", then there is a clique of size mid, hence the two new (lower and upper) bounds of the max clique size are "mid" and "max". On the other hand, if A returns false, then there is no clique of size mid, hence the new bounds for the max clique size are "min" and "mid" - 1. We can repeat the same procedure recursively, till the lower and upper bounds are the same. Note, that this is effectively a binary search for the correct max-clique-size in the feasible range. The pseudo-code is as follows :

        Algorithm B (Input : G, Output : Max clique size of G)
        Begin
            min := 1    // the max clique size cannot be less than 1.
            max := n    // the max clique size cannot be more than n
            FindMaxClique(min,max)        // do a binary search in this range
        End

        Procedure FindMaxClique(min : integer, max : integer)
        Begin
            If (min = max)         // base case of recursion, the two bounds are same
                Output min
                Abort

            mid := (max+min)/2
            If not(A(G,mid)) Then
                FindMaxClique(min, mid-1)
            Else
                FindMaxClique(mid, max)
        End

Analysis : The algorithm does a binary search in the range [1,n] and has therefore O(log n) recursive calls to FindMaxClique, and is otherwise polynomial in time complexity.

5.a. Solution space is the set of all valid linear rearrangements. Thus each node in the solution space is a distinct valid linear arrangement.
   b. Move : Take any pair of vertices u and v. Suppose f(u) = i and f(v) = j. To form a neighbor h of f make h(u) = j and h(v) = i, leaving everything else the same. That is, we simply exchange u and v in the linear arrangement. The linear arrangement thus obtained is a neighbor.
   c. Let f and g be any two linear arrangements. The distance between f and g is the number of vertices v, such that f(v) is different from g(v). If this distance is the zero then f and g are the same. We show that if the distance between f and g is d > 0 then we can find a neighbor h of f such that the distance between h and g is d-1. Let f and g differ on vertex v so that f(v) = i and g(v) = j. Let w be such that f(w) = j.   We exchange v and w to form the neighbor h, that is h(x) = f(x) if x is not one of v or w and, h(v) = j and h(w) = i. The arrangement h has distance d-1 from g because h(v) = g(v).
    d. An upper bound on the time complexity of a greedy local search :
            A move must decrease the cost by at least 1. The minimun cost of a node (over all possible graphs) is 0 (when there are no edges). The maximum cost of a node (over all possible graphs) is, considering the clique graph of n nodes, O(n^3) (a little algebra needed here). Hence the maximum number of moves is O(n^3). Each move considers all of the nodes O(n^2) neighbors, computes the cost of each of them in O(m) time, and is hence O(m*n^2). Hence the overall algorithm has time complexity O(m*n^5).

6.a. Assuming a gray scale of 8 bits per pixel, the compression ratio would be 8:0.1 = 80:1
b. there are a number of ways to achive .1 bpp in VQ. We need to solve the equation (log₂n)/(ab) = .1 bpp, where n is the codebook size, a x b is the block size. Since images are typically a power of two on a side it is wise to choose a and b to be powers of two. It is also a good idea to make the a x b block as square as possible. Finally, you want a decent size codebook so that there are enough representatives around. A decent size codebook might be 1,024 so that a x b should be about 100. How about 128 which is 8 x 16. So n = 1,024 and a = 8 and b = 16 gives us .078 bpp. Since we are shooting for .1 bpp then we should go to a larger codebook, say 4,096. With a = 8 and b = 16 we get .094 bpp.

7.
            A : 0   0   1   1   1   1   0   0
    1st level : 0   1   1   0   0   0   0   0
    2nd level :1/2 1/2 1/2 -1/2
    3rd level :1/2 0.

The three level wavelet transform yields 1/2 0 1/2 -1/2 0 0 0 0
with 1/2 in the lowest resolution subband.

            B : 0   0   0   1   1   1   1   0
    1st level : 0 1/2 1 1/2 0 1/2 0 -1/2
    2nd level :1/4 3/4 1/4 -1/4
    3rd level :1/2 1/4.

The three level wavelet transform yields 1/2 1/4 1/4 -1/4 0 1/2 0 -1/2
with 1/2 in the lowest resolution subband.

8. The dynamic programming table looks like this :

                    A    G    A    T    C
               0   -1   -2   -3   -4   -5
           A -1    2    1    0   -1   -2
           T -2    1    1    0    2    1
           A -3    0    0    3    2    1
           T -4   -1   -1    2    5    4
           T -5   -2   -2    1    4    4
           C -6   -3   -3    0    3    6

a. A maximum score matching is :

A G A T - C
A T A T T C

b. There are more than one max-score matchings, since on backtracking we get a graph, instead of a single path.

9.
Lets name the points

a b c d e f g h
(1,2) (3,3) (5,5) (8,3) (7,2) (6,5) (1,7) (4,6)

Sort on x
a g b h c f e d
(1,2) (1,7) (3,3) (4,6) (5,5) (6,5) (7,2) (8,3)
Sort on y
a e b d c f h g
(1,2) (7,2) (3,3) (8,3) (5,5) (6,5) (4,6) (1,7)

Widest spread is in x so we choose the root to have axis = x and value = 4. The left subtree will have a, g, b, h and the right subtree will have c, f, e, d.

Left subtree of root:

Sort on x
a g b h
(1,2) (1,7) (3,3) (4,6)
Sort on y
a b h g
(1,2) (3,3) (4,6) (1,7)

Widest spread is in y so we choose the axis = y and value = 5. The left-left subtree will have a,b and the right-left will have h,g.

Left-left subtree:

Sort on x
a b
(1,2) (3,3)
Sort on y
a b
(1,2) (3,3)

Widest spread is in x so we choose the axis = x and value = 2. The left-left-left subtree will have a and the right-left will have b.

Left-right subtree:

Sort on x
g h
(1,7) (4,6)
Sort on y
h g
(4,6) (1,7)

Widest spread is in y so we choose the axis = x and value = 3. The left-right-left subtree will have g and the left right-left will have h.

Right subtree of root:

Sort on x
c f e d
(5,5) (6,5) (7,2) (8,3)
Sort on y
e d c f
(7,2) (8,3) (5,5) (6,5)

The widest spread is the same so we arbitrarily choose axis = x and value = 6. The right-left subtree will have c,f and the right-right will have e,d.

Right-left subtree:

Sort on x
c f
(5,5) (6,5)
Sort on y
c f
(5,5) (6,5)

The widest spread is x so we choose axis = x and value = 5. The right-left-left subtree will have c and the right-left-right will have f.

Right-right subtree:
Sort on x
e d
(7,2) (8,3)
Sort on y
e d
(7,2) (8,3)

The widest spread is the same so we arbitrarily choose axis = x and value = 7. The right-right-left subtree will have e and the right-right-right will have d.

In the end we have the following k-d tree:

                              x,4
                            /     \
                           /       \
                     y,5             x,6
                    /   \           /   \
                   /     \         /     \
                 x,2    x,3       x,5    x,7
                /   \ /   \     /   \ /   \
               a    b g     h   c    f e   d

10.
a. In the best case there will be 2n/B cache misses. Every B reads in A incurs a cache miss and every B writes in B incurs a cache miss.
b. In the worst case the arrays are allocated in memory so that A[0] maps to the same location in the cache as B[0]. In this case there will be 2n cache misses because after each read or write operation the cache block accessed is owned by the other array.

11.   a. False
        b. False
        c. True
        d. True
        e. False
        f. False
        g. False
        h. False
        i. True
        j. False