DNA Sequence Alignment
|
|
|
A dynamic programming algorithm |
DNA Sequence Alignment
(aka “Longest Common Subsequence”)
|
|
|
|
The problem |
|
What is a DNA sequence? |
|
DNA similarity |
|
What is DNA sequence alignment? |
|
Using English words |
|
The Naïve algorithm |
|
The Dynamic Programming algorithm |
|
Idea of Dynamic Programming |
What is a DNA sequence
|
|
|
|
DNA: string using letters A,C,G,T |
|
Letter = DNA “base” |
|
e.g. AGATGGGCAAGATA |
|
DNA makes up your “genetic code” |
DNA similarity
|
|
|
|
|
DNA can mutate. |
|
Change a letter |
|
AACCGGTT à ATCCGGTT |
|
Insert a letter |
|
AACCGGTT à ATAACCGGTT |
|
Delete a letter |
|
AACCGGTT à ACCGGTT |
|
A few mutations makes sequences
different, but “similar” |
Why is DNA similarity
important
|
|
|
|
New sequences compared to existing
sequences |
|
Similar sequences often have similar
function |
|
Most widely used algorithm in
computational biology tools |
|
e.g. BLAST at http://www.ncbi.nlm.nih.gov/BLAST/ |
What is DNA sequence
alignment?
|
|
|
Match 2 sequences, with underscore ( _
) wildcards. |
|
Best Alignment à minimum
underscores (slight simplification, but okay for 326) |
|
e.g. |
Moving to English words
Naïve algorithm
|
|
|
Try every way to put in underscores |
|
If it works, and is best so far, record
it. |
|
At end, return best solution. |
Naïve Algorithm – Running
Time
Dynamic Approach – A
table
|
|
|
Table(x,y): best alignment for first x
letters of string 1, and first y letters of string 2 |
|
Decide what to do with the end of
string, then look up best alignment of remainder in Table. |
e.g. ‘a’ vs. ‘s’
|
|
|
|
|
“zasha” vs. “ashes”. 2 possibilities for last letters: |
|
(1) match ‘a’ with ‘_’: |
|
best_alignment(“zash”,”ashes”)+1 |
|
(2) match ‘s’ with ‘_’: |
|
best_alignment(“zasha”,”ashe”)+1 |
|
à best_alignment(“zasha”,”ashes”) |
|
=min(best_alignment(“zash”,”ashes”)+1, |
|
best_alignment(“zasha”,”ashe”)+1) |
An example
Example with solution
Pseudocode (bottom-up)
Pseudocode (top-down)
Running time
|
|
|
Every square in table is filled in once |
|
Filling it in is constant time |
|
Q(n2) squares |
|
à alg is Q(n2) |
Idea of dynamic
programming
|
|
|
|
Re-use expensive computations |
|
Identify critical input to problem
(e.g. best alignment of prefixes of strings) |
|
Store results in table, indexed by
critical input |
|
Solve cells in table of other cells |
|
Top-down often easier to program |