Schedule details will evolve as we go; please check
back here every week or so to see the latest updates.
||Introduction & Overview
||Sequence Alignment, Search & Scoring
||Durbin Ch 1-2; Papers Below + Free Reading (see HW 1)
||Durbin Ch 1-2; Papers Below
||Sequence Motif Modeling & Discovery: MLE & the EM Algorithm
||Durbin Ch 11 (excl Mix. of Dirchlets, Est. Priors in 11.5; skim 11.6); Papers Below
||Sequence Motif Modeling & Discovery: MEME & Gibbs Sampling
||HMMs & Gene Finding
||Durbin Ch 3-5, Papers Below
||RNA Structure, Alignment, & Search
||Durbin Ch 9-10, Papers Below
Richard Durbin, Sean R. Eddy, Anders Krogh and Graeme Mitchison,
Biological Sequence Analysis: Probabilistic models of proteins and nucleic acids,
(U. Book Store,
References: Papers not explicitly listed as "Read" in
each section are optional, good supplementary references,
recommended if you want more depth in any of the areas.
Electronic access to journals is generally free from
on-campus computers. For off-campus access, follow the
"[offcampus]" links or look at the
library "proxy server" instructions.
References -- Introduction & Overview:
Read #2; a bit dated, but a good overview.
Optional: If you want more biology, former students have
recommended Gonick, (also a bit dated, but cheap). Alberts is a
popular undergrad textbook, very comprehensive and very well
- Lawrence Hunter, "Molecular Biology for Computer
Scientists," Chapter 1 of Artificial
Intelligence and Molecular Biology Lawrence Hunter,
ed. AAAI press, 1993. (Also here.)
- Larry Gonick, Mark Wheelis, "The Cartoon Guide to Genetics"
(Updated Edition, 1991) ISBN 0062730991, Collins. (Amazon)
- Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff,
Keith Roberts, Peter Walter, "Molecular Biology of the Cell",
Fourth Edition, 2002, ISBN 0815332181, Garland (Amazon)
References -- Sequence Alignment, Search & Scoring:
Read #5, 6, 8. The Myers review is a bit dated, but still a
good overview of algorithms and algorithmic issues.
- SR Eddy, "What is dynamic programming?" Nat. Biotechnol., 22, #7 (2004) 909-10.
- SR Eddy, "Where did the BLOSUM62 alignment score matrix come from?" Nat. Biotechnol., 22, #8 (2004) 1035-6.
- SR Eddy, "What is Bayesian statistics?" Nat. Biotechnol., 22, #9 (2004) 1177-8.
- A Pertsemlidis, JW Fondon, "Having a BLAST with bioinformatics (and avoiding BLASTphemy)." Genome Biol., 2, #10 (2001) REVIEWS2002.
- Myers, E. (1991) "An overview of sequence comparison
algorithms in molecular biology",
Tech. Rep. TR-91-29, Dept. of Computer Science, Univ. of
References -- Sequence Motif Modeling & Discovery:
Read #11, 12, 14. Dempster et al. is the
"classic" paper on EM. Tompa et al. is a
comprehensive comparison of several motif finding methods.
Blanchette et al. is an important example of use of
comparitive genomics for this problem.
- AP Dempster, NM Laird, DB Rubin, "Maximum Likelihood from
Incomplete Data via the EM Algorithm," Journal of the Royal
Statistical Society. Series B (Methodological), Vol. 39,
No. 1. (1977), pp. 1-38. Available here.
- GD Stormo, "DNA binding sites: representation and discovery." Bioinformatics, 16, #1 (2000) 16-23.
- TL Bailey, C Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers." Proc Int Conf Intell Syst Mol Biol, 2, (1994) 28-36.
[offcampus] Available here.
http://meme.sdsc.edu/meme/ for related papers and
- TL Bailey, C Elkan, "The value of prior knowledge in discovering motifs with MEME." Proc Int Conf Intell Syst Mol Biol, 3, (1995) 21-9.
- CE Lawrence, SF Altschul, MS Boguski, JS Liu, AF Neuwald, JC Wootton, "Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment." Science, 262, #5131 (1993) 208-14.
[offcampus] Available here. [offcampus]
- FP Roth, JD Hughes, PW Estep, GM Church, "Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation." Nat. Biotechnol., 16, #10 (1998) 939-45.
- M Tompa, N Li, TL Bailey, GM Church, B De Moor, E Eskin, AV Favorov, MC Frith, Y Fu, WJ Kent, VJ Makeev, AA Mironov, WS Noble, G Pavesi, G Pesole, M Régnier, N Simonis, S Sinha, G Thijs, J van Helden, M Vandenbogaert, Z Weng, C Workman, C Ye, Z Zhu, "Assessing computational tools for the discovery of transcription factor binding sites." Nat. Biotechnol., 23, #1 (2005) 137-44.
- Emily Rocke and Martin Tompa An Algorithm for Finding
Novel Gapped Motifs in DNA Sequences RECOMB98: Proceedings
of the Second Annual International Conference on Computational
Molecular Biology, New York, NY, March 1998, 228-233.
- M Blanchette, B Schwikowski, M Tompa, "Algorithms for phylogenetic footprinting." J. Comput. Biol., 9, #2 (2002) 211-23.
- M Blanchette, M Tompa, "FootPrinter: A program designed for phylogenetic footprinting." Nucleic Acids Res., 31, #13 (2003) 3840-2.
References -- HMMs & Gene Finding:
Read #20, 22. The Rabiner tutorial is a very good intro to
HMMs if you want a different perspective from the text. Claverie
is a good survey of computational gene finding. Burget and Guigo
is a careful comparison of leading programs of its day. Lander
et al. and Venter are the landmark initial
human genome sequence papers. Klein et al. is an
interesting application of HMMs relevant to RNA gene finding, our
- SR Eddy, "What is a hidden Markov model?" Nat. Biotechnol., 22, #10 (2004) 1315-6.
- LR Rabiner, "A Tutorial on Hidden Markov Models and Selected
Application in Speech Recognition," Proceedings of the IEEE, v 77
#2,Feb 1989, 257-286. here.
- C Burge, S Karlin, "Prediction of complete gene structures in human genomic DNA." J. Mol. Biol., 268, #1 (1997) 78-94.
- M Burset, R Guigó, "Evaluation of gene structure prediction programs." Genomics, 34, #3 (1996) 353-67.
- JM Claverie, "Computational methods for the identification of genes in vertebrate genomic sequences." Hum. Mol. Genet., 6, #10 (1997) 1735-44.
- An extensive online bibliography
- ES Lander, LM Linton, B Birren, C Nusbaum, MC Zody, J Baldwin, K Devon, K Dewar, M Doyle, W FitzHugh, R Funke, D Gage, K Harris, A Heaford, J Howland, L Kann, J Lehoczky, R LeVine, P McEwan, K McKernan, J Meldrim, JP Mesirov, C Miranda, W Morris, J Naylor, C Raymond, M Rosetti, R Santos, A Sheridan, C Sougnez, N Stange-Thomann, N Stojanovic, A Subramanian, D Wyman, J Rogers, J Sulston, R Ainscough, S Beck, D Bentley, J Burton, C Clee, N Carter, A Coulson, R Deadman, P Deloukas, A Dunham, I Dunham, R Durbin, L French, D Grafham, S Gregory, T Hubbard, S Humphray, A Hunt, M Jones, C Lloyd, A McMurray, L Matthews, S Mercer, S Milne, JC Mullikin, A Mungall, R Plumb, M Ross, R Shownkeen, S Sims, RH Waterston, RK Wilson, LW Hillier, JD McPherson, MA Marra, ER Mardis, LA Fulton, AT Chinwalla, KH Pepin, WR Gish, SL Chissoe, MC Wendl, KD Delehaunty, TL Miner, A Delehaunty, JB Kramer, LL Cook, RS Fulton, DL Johnson, PJ Minx, SW Clifton, T Hawkins, E Branscomb, P Predki, P Richardson, S Wenning, T Slezak, N Doggett, JF Cheng, A Olsen, S Lucas, C Elkin, E Uberbacher, M Frazier, RA Gibbs, DM Muzny, SE Scherer, JB Bouck, EJ Sodergren, KC Worley, CM Rives, JH Gorrell, ML Metzker, SL Naylor, RS Kucherlapati, DL Nelson, GM Weinstock, Y Sakaki, A Fujiyama, M Hattori, T Yada, A Toyoda, T Itoh, C Kawagoe, H Watanabe, Y Totoki, T Taylor, J Weissenbach, R Heilig, W Saurin, F Artiguenave, P Brottier, T Bruls, E Pelletier, C Robert, P Wincker, DR Smith, L Doucette-Stamm, M Rubenfield, K Weinstock, HM Lee, J Dubois, A Rosenthal, M Platzer, G Nyakatura, S Taudien, A Rump, H Yang, J Yu, J Wang, G Huang, J Gu, L Hood, L Rowen, A Madan, S Qin, RW Davis, NA Federspiel, AP Abola, MJ Proctor, RM Myers, J Schmutz, M Dickson, J Grimwood, DR Cox, MV Olson, R Kaul, C Raymond, N Shimizu, K Kawasaki, S Minoshima, GA Evans, M Athanasiou, R Schultz, BA Roe, F Chen, H Pan, J Ramser, H Lehrach, R Reinhardt, WR McCombie, M de la Bastide, N Dedhia, H Blöcker, K Hornischer, G Nordsiek, R Agarwala, L Aravind, JA Bailey, A Bateman, S Batzoglou, E Birney, P Bork, DG Brown, CB Burge, L Cerutti, HC Chen, D Church, M Clamp, RR Copley, T Doerks, SR Eddy, EE Eichler, TS Furey, J Galagan, JG Gilbert, C Harmon, Y Hayashizaki, D Haussler, H Hermjakob, K Hokamp, W Jang, LS Johnson, TA Jones, S Kasif, A Kaspryzk, S Kennedy, WJ Kent, P Kitts, EV Koonin, I Korf, D Kulp, D Lancet, TM Lowe, A McLysaght, T Mikkelsen, JV Moran, N Mulder, VJ Pollara, CP Ponting, G Schuler, J Schultz, G Slater, AF Smit, E Stupka, J Szustakowski, D Thierry-Mieg, J Thierry-Mieg, L Wagner, J Wallis, R Wheeler, A Williams, YI Wolf, KH Wolfe, SP Yang, RF Yeh, F Collins, MS Guyer, J Peterson, A Felsenfeld, KA Wetterstrand, A Patrinos, MJ Morgan, P de Jong, JJ Catanese, K Osoegawa, H Shizuya, S Choi, YJ Chen, J Szustakowki, , "Initial sequencing and analysis of the human genome." Nature, 409, #6822 (2001) 860-921.
- JC Venter, MD Adams, EW Myers, PW Li, RJ Mural, GG Sutton, HO Smith, M Yandell, CA Evans, RA Holt, JD Gocayne, P Amanatides, RM Ballew, DH Huson, JR Wortman, Q Zhang, CD Kodira, XH Zheng, L Chen, M Skupski, G Subramanian, PD Thomas, J Zhang, GL Gabor Miklos, C Nelson, S Broder, AG Clark, J Nadeau, VA McKusick, N Zinder, AJ Levine, RJ Roberts, M Simon, C Slayman, M Hunkapiller, R Bolanos, A Delcher, I Dew, D Fasulo, M Flanigan, L Florea, A Halpern, S Hannenhalli, S Kravitz, S Levy, C Mobarry, K Reinert, K Remington, J Abu-Threideh, E Beasley, K Biddick, V Bonazzi, R Brandon, M Cargill, I Chandramouliswaran, R Charlab, K Chaturvedi, Z Deng, V Di Francesco, P Dunn, K Eilbeck, C Evangelista, AE Gabrielian, W Gan, W Ge, F Gong, Z Gu, P Guan, TJ Heiman, ME Higgins, RR Ji, Z Ke, KA Ketchum, Z Lai, Y Lei, Z Li, J Li, Y Liang, X Lin, F Lu, GV Merkulov, N Milshina, HM Moore, AK Naik, VA Narayan, B Neelam, D Nusskern, DB Rusch, S Salzberg, W Shao, B Shue, J Sun, Z Wang, A Wang, X Wang, J Wang, M Wei, R Wides, C Xiao, C Yan, A Yao, J Ye, M Zhan, W Zhang, H Zhang, Q Zhao, L Zheng, F Zhong, W Zhong, S Zhu, S Zhao, D Gilbert, S Baumhueter, G Spier, C Carter, A Cravchik, T Woodage, F Ali, H An, A Awe, D Baldwin, H Baden, M Barnstead, I Barrow, K Beeson, D Busam, A Carver, A Center, ML Cheng, L Curry, S Danaher, L Davenport, R Desilets, S Dietz, K Dodson, L Doup, S Ferriera, N Garg, A Gluecksmann, B Hart, J Haynes, C Haynes, C Heiner, S Hladun, D Hostin, J Houck, T Howland, C Ibegwam, J Johnson, F Kalush, L Kline, S Koduru, A Love, F Mann, D May, S McCawley, T McIntosh, I McMullen, M Moy, L Moy, B Murphy, K Nelson, C Pfannkoch, E Pratts, V Puri, H Qureshi, M Reardon, R Rodriguez, YH Rogers, D Romblad, B Ruhfel, R Scott, C Sitter, M Smallwood, E Stewart, R Strong, E Suh, R Thomas, NN Tint, S Tse, C Vech, G Wang, J Wetter, S Williams, M Williams, S Windsor, E Winn-Deen, K Wolfe, J Zaveri, K Zaveri, JF Abril, R Guigó, MJ Campbell, KV Sjolander, B Karlak, A Kejariwal, H Mi, B Lazareva, T Hatton, A Narechania, K Diemer, A Muruganujan, N Guo, S Sato, V Bafna, S Istrail, R Lippert, R Schwartz, B Walenz, S Yooseph, D Allen, A Basu, J Baxendale, L Blick, M Caminha, J Carnes-Stine, P Caulk, YH Chiang, M Coyne, C Dahlke, A Mays, M Dombroski, M Donnelly, D Ely, S Esparham, C Fosler, H Gire, S Glanowski, K Glasser, A Glodek, M Gorokhov, K Graham, B Gropman, M Harris, J Heil, S Henderson, J Hoover, D Jennings, C Jordan, J Jordan, J Kasha, L Kagan, C Kraft, A Levitsky, M Lewis, X Liu, J Lopez, D Ma, W Majoros, J McDaniel, S Murphy, M Newman, T Nguyen, N Nguyen, M Nodell, S Pan, J Peck, M Peterson, W Rowe, R Sanders, J Scott, M Simpson, T Smith, A Sprague, T Stockwell, R Turner, E Venter, M Wang, M Wen, D Wu, M Wu, A Xia, A Zandieh, X Zhu, "The sequence of the human genome." Science, 291, #5507 (2001) 1304-51.
- RJ Klein, Z Misulovin, SR Eddy, "Noncoding RNA genes identified in AT-rich hyperthermophiles." Proc. Natl. Acad. Sci. U.S.A., 99, #11 (2002) 7542-7.
- JP Staley, C Guthrie, "Mechanical devices of the spliceosome: motors, clocks, springs, and things." Cell, 92, #3 (1998) 315-26.
References -- RNA Structure, Alignment, & Search:
Read #34, 38, 40, 42. Optional: Refs 31-33 are good surveys of
recent surprising discoveries about the roles of non-coding RNA.
Refs 47-49 might give you some picture of one nice biological
example and how computational approaches are useful in this
- G Storz, "An expanding universe of noncoding RNAs." Science, 296, #5571 (2002) 1260-3.
- SR Eddy, "Computational genomics of noncoding RNA genes." Cell, 109, #2 (2002) 137-40.
- A Hüttenhofer, P Schattner, N Polacek, "Non-coding RNAs: hope or hype?" Trends Genet., 21, #5 (2005) 289-97.
- SR Eddy, "How do RNA folding algorithms work?" Nat. Biotechnol., 22, #11 (2004) 1457-8.
- JS McCaskill, "The equilibrium partition function and base pair binding probabilities for RNA secondary structure." Biopolymers, 29, #6-7 (1990 May-Jun) 1105-19.
- RB Lyngsø, M Zuker, CN Pedersen, "Fast evaluation of internal loops in RNA secondary structure prediction." Bioinformatics, 15, #6 (1999) 440-5.
- PP Gardner, R Giegerich, "A comprehensive comparison of comparative RNA structure prediction approaches." BMC Bioinformatics, 5, (2004) 140.
- SR Eddy, R Durbin, "RNA sequence analysis using covariance models." Nucleic Acids Res., 22, #11 (1994) 2079-88.
- SR Eddy, "A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure." BMC Bioinformatics, 3, (2002) 18.
- S Griffiths-Jones, A Bateman, M Marshall, A Khanna, SR Eddy, "Rfam: an RNA family database." Nucleic Acids Res., 31, #1 (2003) 439-41.
- S Griffiths-Jones, S Moxon, M Marshall, A Khanna, SR Eddy, A Bateman, "Rfam: annotating non-coding RNAs in complete genomes." Nucleic Acids Res., 33, #Database issue (2005) D121-4.
- Z Weinberg, WL Ruzzo, "Faster Genome Annotation of
Non-coding RNA Families Without Loss of Accuracy." Eighth Annual International
Conference on Research in Computational Molecular Biology (RECOMB
2004) , pp 243-251,
March 2004, San Diego, CA. Preprint.
- Z Weinberg, WL Ruzzo, "Exploiting conserved structure for faster annotation of non-coding RNAs without loss of accuracy." Bioinformatics, 20 Suppl 1, (2004) I334-I341.
- Z Weinberg, WL Ruzzo, "Sequence-based heuristics for faster annotation of non-coding RNA families." Bioinformatics, 22, #1 (2006) 35-9.
- Z Yao, Z Weinberg, WL Ruzzo, "CMfinder--a covariance model based RNA motif finding algorithm." Bioinformatics, 22, #4 (2006) 445-52.
- M Mandal, M Lee, JE Barrick, Z Weinberg, GM Emilsson, WL Ruzzo, RR Breaker, "A glycine-dependent riboswitch that uses cooperative binding to control gene expression." Science, 306, #5694 (2004) 275-9.
- JE Barrick, N Sudarsan, Z Weinberg, WL Ruzzo, RR Breaker, "6S RNA is a widespread regulator of eubacterial RNA polymerase that resembles an open promoter." RNA, 11, #5 (2005) 774-84.
- AE Trotochaud, KM Wassarman, "A highly conserved 6S RNA structure is required for regulation of transcription." Nat. Struct. Mol. Biol., 12, #4 (2005) 313-9.
- DK Willkomm, J Minnerup, A Hüttenhofer, RK Hartmann, "Experimental RNomics in Aquifex aeolicus: identification of small non-coding RNAs and the putative 6S RNA homolog." Nucleic Acids Res., 33, #6 (2005) 1949-60.