The final exam will be held on Tuesday, June 8, 1999 from 6:30 to 8:20 in MEB 238. The Mechanical Engineering Building is on campus, just south of Loew hall and just east of the Electrical Engineering Building. The room has natural light on the east side of the building and has tables to spread out a little.

What you must bring:

8 1/2 x 11 inch blue book for taking the exam.

What you can bring:

Two text books of your choice.

Lecture materials down loaded from
the web.

Assignments and their solutions.

Your own personal notes.

Calculator.

The exam will cover the entire course. You should understand all the algorithms that we covered in class at the level of being able to do a hand simulation on small data sets. You should be able to take an informally specified problem and formally model it so that it can be attacked using algorithmic techniques. You should be able to analyze algorithms by giving time and storage bounds on their executions. You should be able to analyze the cache performance of algorithms. You should understand NP-completeness, NP-hardness, and polynomial time reductions. You should understand the basics of data compression including entropy. You should understand the basics of lossy data compression including VQ and Wavelet transforms.

Here is a list of the most important algorithms that we covered.

Depth-first search

Breadth-first search

Minimum spanning tree (Kruskal's
algorithm)

Minimum spanning tree (Prim's algorithm)

Disjoint-Union/Find with path compression
and weighted union

Branch and bound algorithms

Greedy local search

Simulated annealing

Multiple move strategies for local
search

Subset sum algorithm

Cache-conscious mergesort

d-heaps

Cache-conscious heapsort

Cache-conscious static search

Huffman coding

Arithmetic coding

LZW

Sequitur

VQ coding and decoding

Generalized Lloyd algorithm for
codebook design

Nearest neighbor search by Orchard

Nearest neighbor search by k-d trees

SPIHT

Approximate DNA sequence matching
by dynamic programming

PQ-tree algorithms for DNA sequence
mapping

Below I give a sample problems for you to practice on. Your practice on these problems may take more that 1 hour and 50 minutes

Practice problems:

1. Consider the following symbols with the given probabilities: a 1/20, b 2/20, c 1/20, d 3/20, e 6/20, f 2/20, g 3/20, h 2/20.

a. Design an optimal
Huffman tree for this set of symbols.

b. What is the
average number of bits per symbol for your code?

c. Code the symbol
steam e f d g h using your code.

2. Use the LZW decoding algorithm to decode the stream 0 1 2 5 3 7 where the initial dictionary is

` 0 a`
` 1 b`
` 2 c`

Show the resulting dictionary after the decoding.

3. Use Kruskal's algorithm to find the minimum spanning tree for the following graph. List the edges in the order they are considered for inclusion into the minimum spanning tree.

`
2`
`
a --------- b`
`
/ \ / \`
` 4/
\1 2/ \5`
` /
\ / \`
` c -------- d
------- e`
` \
3 / \ 5 /`
` 2\
/3 1 \ / 4`
`
\ / \ /`
`
f --------- g`
`
5`

4. Given an undirected graph G = (V,E), a clique is a set X contained in V with the property that for any two vertices x,y in X, {x,y} is an edge in E. The problem of finding a maximum size clique in a graph is NP-hard. Suppose there is an algorithm A which takes an undirected graph G and a number k as input and decides (yes or no) whether G has a clique of size k. Design an algorithm B which on input G outputs a number m with the property that G has a clique of size m, but no clique of size m+1. The algorithm B is allowed to make O(log n) calls to the algorithm A, but otherwise must run in polynomial time.

5. An interesting NP-hard problem is finding the minimum cost linear arrangement for an undirected graph G = (V,E). A linear arrangement is a function f from the vertices V to the integers {1,2,...,|V|}. The cost(f) = the sum over all edges {u,v} in E of |f(u) - f(v)|. For example, consider the graph:

` a -----
b`
`
\ /`
`
\ /`
`
\ /`
`
c`
`
|`
`
|`
`
d`

Consider the arrangement f(a) = 1, f(b) = 3, f(c) = 2, f(d) = 4, that is we have the arrangement

` 1 2 3
4`
` a c b
d`

Cost(f) is the sum of 2 {a,b}, 1 {a,c}, 1 {b,c}, and 2 {c,d} for a total of 6.

Because this is an NP-hard problem we would like to design local search algorithms to solve it. The first step in setting this problem up as a local search problem is to define the solution space and the neighborhood of a solution.

a. Design the solution space for this problem.

b. Describe a move relation for the solution space
that take a solution to one of its neighbors.

c. Argue briefly that every solution is reachable
from every other by a sequence of moves.

d. What is an upper bound on the time complexity
of a greedy steepest descent algorithm based on your solution space.

6. We would like to compress gray scale images using VQ to 0.1 bpp.

a. What is the compression ratio we want to achieve?

b. What block size and codebook size should we use
to achieve this ratio?

7. Consider the following two sequences of eight data points: A = 0, 0, 1, 1, 1, 1, 0, 0 and B = 0, 0, 0, 1, 1, 1, 1, 0.

a. Apply one level of the Haar wavelet transform
to both A and B.

b. After applying 3 levels of the Haar wavelet transform
to both A and B, what is the value of the lowest resolution subband of
each?

8. Use the dynamic programming algorithm to find a maximum score matching
for the two strings AGATC and ATATTC. The scoring function
is +2 for a match and -1 for a mismatch.

a. Give one matching with maximum score..

b Is there more than one matching with maximum
score.

9. Consider the 8 points in 2 dimensions (1,2), (3,3), (5,5), (8,3), (7,2), (6,5), (1,7), (4, 6). Design a k-d tree for this data that uses the criterion of splitting perpendicular to the axis with the widest spread.

10. Suppose you have a cache with B bytes per cache line. Consider the problem of copying an array A[0..n-1] of characters to an array B[0..n-1] of characters, where neither array has any data in the cache. Assume a direct mapped cache.

a. In the best case how many cache misses will be
incurred during this copy operation.

b. In the worst case how many cache misses will
be incurred during this copy operation.

11. Answer each of the following as true or false:

a. To show that a decision problem is NP-complete
you must first show it is in NP then show that it is reducible to some
known NP-complete problem.

b. The GLA algorithm for codebook design always
gives a globally optimal codebook.

c. Branch-and-bound algorithms yield optimal solutions.

d. A well designed greedy steepest descent local
search algorithm runs in polynomial time.

e. An algorithm that has the highest utilization
of its cache lines will have the fewest cache misses.

f . The subset sum problem has a polynomial time
algorithm if the inputs are presented in binary.

g. The sequitur algorithm always yields better compression
than LZW.

h. Arithmetic coding for fixed length strings achieves
the entropy lower bound.

i. The worst case time for an individual operation
in weighted disjoint union / path compression find for a universe of size
n is O(log n).

j. The average time per operation in for at least
n operations in weighted disjoint union / path compression find for
a universe of size n is O(n).