Final Exam Study Guide

CSE 589, Spring 1999, Final Exam Study Guide

The final exam will be held on Tuesday, June 8, 1999 from 6:30 to 8:20 in MEB 238. The Mechanical Engineering Building is on campus, just south of Loew hall and just east of the Electrical Engineering Building. The room has natural light on the east side of the building and has tables to spread out a little.

What you must bring:

8 1/2 x 11 inch blue book for taking the exam.

What you can bring:

Two text books of your choice.
Lecture materials down loaded from the web.
Assignments and their solutions.
Your own personal notes.
Calculator.

The exam will cover the entire course. You should understand all the algorithms that we covered in class at the level of being able to do a hand simulation on small data sets. You should be able to take an informally specified problem and formally model it so that it can be attacked using algorithmic techniques. You should be able to analyze algorithms by giving time and storage bounds on their executions. You should be able to analyze the cache performance of algorithms. You should understand NP-completeness, NP-hardness, and polynomial time reductions. You should understand the basics of data compression including entropy. You should understand the basics of lossy data compression including VQ and Wavelet transforms.

Here is a list of the most important algorithms that we covered.

Depth-first search
Breadth-first search
Minimum spanning tree (Kruskal's algorithm)
Minimum spanning tree (Prim's algorithm)
Disjoint-Union/Find with path compression and weighted union
Branch and bound algorithms
Greedy local search
Simulated annealing
Multiple move strategies for local search
Subset sum algorithm
Cache-conscious mergesort
d-heaps
Cache-conscious heapsort
Cache-conscious static search
Huffman coding
Arithmetic coding
LZW
Sequitur
VQ coding and decoding
Generalized Lloyd algorithm for codebook design
Nearest neighbor search by Orchard
Nearest neighbor search by k-d trees
SPIHT
Approximate DNA sequence matching by dynamic programming
PQ-tree algorithms for DNA sequence mapping

Below I give a sample problems for you to practice on. Your practice on these problems may take more that 1 hour and 50 minutes

Practice problems:

1. Consider the following symbols with the given probabilities: a 1/20, b 2/20, c 1/20, d 3/20, e 6/20, f 2/20, g 3/20, h 2/20.

    a. Design an optimal Huffman tree for this set of symbols.
    b. What is the average number of bits per symbol for your code?
    c. Code the symbol steam e f d g h using your code.

2. Use the LZW decoding algorithm to decode the stream 0 1 2 5 3 7 where the initial dictionary is

        0 a
        1 b
        2 c

Show the resulting dictionary after the decoding.

3. Use Kruskal's algorithm to find the minimum spanning tree for the following graph. List the edges in the order they are considered for inclusion into the minimum spanning tree.

                     2
               a --------- b
             / \        / \
           4/    \1    2/    \5
           /      \    /      \
         c -------- d ------- e
           \   3 /    \   5 /
           2\    /3   1 \    / 4
             \ /        \ /
              f --------- g
                    5

4. Given an undirected graph G = (V,E), a clique is a set X contained in V with the property that for any two vertices x,y in X, {x,y} is an edge in E. The problem of finding a maximum size clique in a graph is NP-hard. Suppose there is an algorithm A which takes an undirected graph G and a number k as input and decides (yes or no) whether G has a clique of size k. Design an algorithm B which on input G outputs a number m with the property that G has a clique of size m, but no clique of size m+1. The algorithm B is allowed to make O(log n) calls to the algorithm A, but otherwise must run in polynomial time.

5. An interesting NP-hard problem is finding the minimum cost linear arrangement for an undirected graph G = (V,E). A linear arrangement is a function f from the vertices V to the integers {1,2,...,|V|}. The cost(f) = the sum over all edges {u,v} in E of |f(u) - f(v)|. For example, consider the graph:

           a ----- b
            \     /
             \   /
              \ /
               c
               |
               |
               d

Consider the arrangement f(a) = 1, f(b) = 3, f(c) = 2, f(d) = 4, that is we have the arrangement

1 2 3 4
a c b d

Cost(f) is the sum of 2 {a,b}, 1 {a,c}, 1 {b,c}, and 2 {c,d} for a total of 6.

Because this is an NP-hard problem we would like to design local search algorithms to solve it. The first step in setting this problem up as a local search problem is to define the solution space and the neighborhood of a solution.

    a. Design the solution space for this problem.
    b. Describe a move relation for the solution space that take a solution to one of its neighbors.
    c. Argue briefly that every solution is reachable from every other by a sequence of moves.
    d. What is an upper bound on the time complexity of a greedy steepest descent algorithm based on your solution space.

6. We would like to compress gray scale images using VQ to 0.1 bpp.

a. What is the compression ratio we want to achieve?
b. What block size and codebook size should we use to achieve this ratio?

7. Consider the following two sequences of eight data points: A = 0, 0, 1, 1, 1, 1, 0, 0 and B = 0, 0, 0, 1, 1, 1, 1, 0.

a. Apply one level of the Haar wavelet transform to both A and B.
b. After applying 3 levels of the Haar wavelet transform to both A and B, what is the value of the lowest resolution subband of each?

8. Use the dynamic programming algorithm to find a maximum score matching for the two strings AGATC and ATATTC. The scoring function is +2 for a match and -1 for a mismatch.
a. Give one matching with maximum score..
b Is there more than one matching with maximum score.

9. Consider the 8 points in 2 dimensions (1,2), (3,3), (5,5), (8,3), (7,2), (6,5), (1,7), (4, 6). Design a k-d tree for this data that uses the criterion of splitting perpendicular to the axis with the widest spread.

10. Suppose you have a cache with B bytes per cache line. Consider the problem of copying an array A[0..n-1] of characters to an array B[0..n-1] of characters, where neither array has any data in the cache. Assume a direct mapped cache.

a. In the best case how many cache misses will be incurred during this copy operation.
b. In the worst case how many cache misses will be incurred during this copy operation.

11. Answer each of the following as true or false:

    a. To show that a decision problem is NP-complete you must first show it is in NP then show that it is reducible to some known NP-complete problem.
    b. The GLA algorithm for codebook design always gives a globally optimal codebook.
    c. Branch-and-bound algorithms yield optimal solutions.
    d. A well designed greedy steepest descent local search algorithm runs in polynomial time.
    e. An algorithm that has the highest utilization of its cache lines will have the fewest cache misses.
    f . The subset sum problem has a polynomial time algorithm if the inputs are presented in binary.
    g. The sequitur algorithm always yields better compression than LZW.
    h. Arithmetic coding for fixed length strings achieves the entropy lower bound.
    i. The worst case time for an individual operation in weighted disjoint union / path compression find for a universe of size n is O(log n).
    j. The average time per operation in for at least n operations in weighted disjoint union / path compression find for a universe of size n is O(n).