CSE 421 Assignment #5
Autumn 2008

Due: Friday, November 7, 2008.

Reading Assignment: Kleinberg and Tardos Chapter 6.

Problems: Re-read the Grading Guidelines before answering (For all problems on this homework prove that your algorithm is correct and analyze its worst-case running time.)

  1. Kleinberg and Tardos, Chapter 6, Problem 5, pages 316-317.

  2. Kleinberg and Tardos, Chapter 6, Problem 6, pages 317-318.

  3. When people type, they are much more likely to transpose two adjacent characters rather than insert, delete, or change them. Suppose that there is a cost 1 per transposition, cost 2 for either an insertion or deletion and cost eab with 1< eab for typing an a instead of a b. (The likelihood of a mistyped character depends on how far apart the characters are on the keyboard.) Design an algorithm to compute the edit distance between two strings under this measure and determine the sequence of errors needed to achieve this distance. Make sure that the transposition errors you consider do not involve the same letter twice.

  4. Extra Credit In this problem you are given as input a binary tree T=(V,E) with each leaf l labelled by a symbol sl from some fixed alphabet A, as well as a two-dimensional non-negative array of costs indexed by pairs of elements of A such that c[s,t] is the cost of changing symbol s to symbol t and this satisfies c[s,s]=0 and c[s,t]=c[t,s] for all s,t in A. Design an algorithm to label each internal node v of the binary tree by a symbol sv from A so as to minimize the sum over all (u,v) in E of c[su,sv].

    (This kind of problem arises in computational biology when one wishes to assess a potential evolutionary tree and reconstruct properties of potential ancestral species given this tree. In this case each "symbol" might be a very short sequence - or even just a letter - of DNA or protein that might at a given position within a longer sequence, and c[s,t] is the cost of the evolutionary change in going from s to t.)