|
|
|
A dynamic programming algorithm |
|
|
|
|
|
The problem |
|
What is a DNA sequence? |
|
DNA similarity |
|
What is DNA sequence alignment? |
|
Using English words |
|
The Naïve algorithm |
|
The Dynamic Programming algorithm |
|
Idea of Dynamic Programming |
|
|
|
|
|
DNA: string using letters A,C,G,T |
|
Letter = DNA “base” |
|
e.g. AGATGGGCAAGATA |
|
DNA makes up your “genetic code” |
|
|
|
|
|
|
DNA can mutate. |
|
Change a letter |
|
AACCGGTT à ATCCGGTT |
|
Insert a letter |
|
AACCGGTT à ATAACCGGTT |
|
Delete a letter |
|
AACCGGTT à ACCGGTT |
|
A few mutations makes sequences different, but
“similar” |
|
|
|
|
|
New sequences compared to existing sequences |
|
Similar sequences often have similar function |
|
Most widely used algorithm in computational
biology tools |
|
e.g. BLAST at http://www.ncbi.nlm.nih.gov/BLAST/ |
|
|
|
|
Match 2 sequences, with underscore ( _ )
wildcards. |
|
Best Alignment à minimum underscores
(slight simplification, but okay for 326) |
|
e.g. |
|
|
|
|
|
Try every way to put in underscores |
|
If it works, and is best so far, record it. |
|
At end, return best solution. |
|
|
|
|
|
Table(x,y): best alignment for first x letters
of string 1, and first y letters of string 2 |
|
Decide what to do with the end of string, then
look up best alignment of remainder in Table. |
|
|
|
|
|
|
“zasha” vs. “ashes”. 2 possibilities for last letters: |
|
(1) match ‘a’ with ‘_’: |
|
best_alignment(“zash”,”ashes”)+1 |
|
(2) match ‘s’ with ‘_’: |
|
best_alignment(“zasha”,”ashe”)+1 |
|
à best_alignment(“zasha”,”ashes”) |
|
=min(best_alignment(“zash”,”ashes”)+1, |
|
best_alignment(“zasha”,”ashe”)+1) |
|
|
|
|
|
|
|
|
Every square in table is filled in once |
|
Filling it in is constant time |
|
Q(n2) squares |
|
à alg is Q(n2) |
|
|
|
|
|
Re-use expensive computations |
|
Identify critical input to problem (e.g. best
alignment of prefixes of strings) |
|
Store results in table, indexed by critical
input |
|
Solve cells in table of other cells |
|
Top-down often easier to program |
|