Overview of progressive alignment:
A heuristic approach to multiple sequence alignment
Based on Section 6.4 from Durbin et al.
Martin Tompa
CSE 427: Computational Biology
January 27, 2010
Overview of progressive alignment
Progressive alignment (Feng and Doolittle, 1987) is a heuristic for
multiple sequence alignment that does not optimize any obvious
alignment score. The idea is to do a succession of pairwise
alignments, starting with the most similar pairs of sequences and
proceeding to less similar ones. Here is an outline of the steps:
Inputs: N sequences
- Using standard pairwise alignment, calculate a matrix of
distances (alignment scores) between each pair of sequences.
Consider this as an N-clique G, where edge {i,j} is labeled with
the score of an optimal alignment of the i-th and j-th sequences.
- Use Kruskal's algorithm to find a minimum
spanning tree of G. Whenever a minimum spanning tree edge would
connect two components, instead add a new root node with directed
edges to the roots of the two components. This is the "guide tree".
- Do pairwise alignments according to the guide
tree, working from the leaves to the root. A node u with
children v and w corresponds to an alignment of the leaves of v's
subtree (already aligned inductively) with the leaves of w's
subtree (already aligned).
- Initially, each node is the root of its own tree.
- Consider edges in increasing order of edge label.
- If the next edge e connects nodes {a,b} in the same tree, discard e.
- Otherwise, find the root v of the tree containing a, and the root
w of the tree containing b. Add a new root u with children v and w,
thus merging the trees containing a and b into a single tree.
Suppose V is an alignment of the sequences at the leaves of v's
subtree, and W is an alignment of the sequences at the leaves of w's
subtree. Let {a,b} be the pair of sequences that caused these
subtrees to be merged, and let A be the optimal alignment of a and b.
Use A to guide the alignment of the two alignments V and W.
References
- Durbin, R., Eddy, S. R., Krogh, A. and Mitchison, G.,
Biological Sequence Analysis: Probabilistic models of proteins
and nucleic acids. Cambridge University Press, 1998.
-
Feng, D. F. and Doolittle, R. F., "Progressive Alignment of Amino Acid
Sequences and Construction of Phylogenetic Trees from Them".
Methods in Enzymology, vol. 266, 1996, 368-382,